Agile Software Architecture: software process

Showing posts with label software process. Show all posts

Monday, March 23, 2009

Evolutionary Architecture

Agile Architecture, Round 2. As I (and others) have said, "Do The Simplest Thing That Could Possibly Work" applies to more than your software. So today, we'll look at the common growth process for a system, and some of the constraints that limit each phase. Our example is a web-based, database-driven application, but can be applied to any system with persistent storage. (Thanks to mxGraph for images).

Step 1 is the absolute basic setup: everything running on one server. One webserver, one database backend. I suppose Step Zero would be no database, just files or something. But even that trivial example is enough to get us off the ground. No failover, no backup, no load handling? No problem! Now, I would not advise using this type of system in a live environment, but for prototyping and identifying performance bottlenecks, it's just fine. Get something up and running, then see how it behaves. The ultimate limit of the system with this configuration is limited by how big of a box you can get your hands on. But most likely, one giant enterprise server would not work as well as step 2, for reasons that will be explained.

This also would be a good place to mention the value of monitoring and logging at this level. In order to identify bottlenecks, you need to be able to see how your system responds to load.

Step 2 is a very common setup. It's not that much different from step 1, except that the database and webserver are on physically separate machines. This is the first step toward separating into functionality, except the functionality is as coarse-grained as it can be at this point. You could just as easily add a clone of the first machine, but now you've got two sets of processes to manage, which we will address in the next iteration. This setup allows us to tune each machine for its specified task: application or database access.

By introducing a second machine, we've now introduced a synchronization issue between them. If one machine goes down, the second machine needs to be able to address that somehow. Fail the request? Queue and retry? The answer is "It depends" on the situation being addressed. Serving webpages, the user may wish to submit their query again. Storing credit card requests may need to be more robust. But we can handle the problem with the next step:

Step 3 could theoretically be scaled out to hundreds or thousands of servers, and Step 4 is really a subdivision of Step 3. Notice that our number of servers is increasing exponentially. We started at 1, then added a second. Now we're managing (at least) 4. The cluster implementation can be as simple or as complicated as you make it - J2ee container-managed cluster or simple proxy software. The important things with this step are:

you can now scale as many servers as you may need, quickly and simply.
you're removed almost all failures from impacting availability.
you're still constrained by the database master server.

This step is assuming that you're still dealing with fairly homologous data at this point: all the data coming and going from the servers can be dealt with similarly. This type of layout is pretty common among the Less Than Huge websites, that is, excepting the Ebay's, Facebook's, and Google's of the world. Once you get that huge, you will probably need a unique approach to handle your traffic anyway. But this will get you to 99% of the way there. At this point, you've already addressed pushing updates to your servers and handling inconsistency between them (in the previous step). But let's say your monitoring is indicating that you're either writing too much data for one write-only database to keep up with. Or let's say you identified a major performance bottleneck in the system. That's where step 4 comes into play:

We're now, finally, at a point that could be considered "architecture." Each type of process is located on its own server, connecting to a database that contains the data that it needs to perform that function. This would be the point where de-normalization may come into play. By removing the need for connecting the databases, the minor performance/space hit taken by denormalization would be offset by the lack of interconnection between the databases. Also at this point, there may be additional vertical stacking, separating the presentation layer from specific data-processing. Now we're into the classic 3 (or n)-tier model, but the internals of those layers can scale as large as we want (within some physical limits).

So to sum up, your architecture should "grow" along with your application. It should provide a framework to allow your application to handle growth, not restrict you to grow in a specific path.

Tuesday, March 10, 2009

Schedule, LOC, and Budget? You're Lying

Software engineering is an imprecise science. Anyone who tells you differently is either not working very hard, or doesn't understand that fact (or both). Actually, calling it engineering is a bit of a misnomer itself, since the engineering work in building systems is as different as engineering a submarine vs. a bridge. Engineering involves creating repeatable systems of common features. That is, there are a set of rules for building sets of bridges that have little in common with the rules for building submarines.

The problem with defining software as engineering lies in the identification of the dividing line between engineering and manufacturing. Determining the amount of effort (time, code, and people) for software is much less defined than traditional "peoplespace" engineering because of the fluidity of the tools that are used. Imagine writing a proposal to build the first transcontinental railroad. 50 miles per year, thousands of workers. Now image the proposal today: much more production, much less workers. Computers allow the creation of these monstrously efficient production mechanisms. Hence the statement "hardware is cheap, people are expensive."

Looking at schedules, we see there are two types of release schedules: time-constrained (release on a certain date, like Ubuntu), or effort-constrained (all these features are in release X, like, say, Apache Httpd). Time-constrained releases on a set date, with a variable number of features. Effort-constrained delivers when the set features are done. Neither schedule mechanism has any concern with people or time. So either you release on a scheduled date, with whatever is done at that time, or you release when everything is done, regardless of date.

It would be silly to create a release schedule that released every 10,000 lines of code, but that's what our snakeoil salesman are proposing. Here's how it works: a budget is calculated based on the estimated lines of code, times the number of man/hours that would take to create, based on historical estimates of productivity. So, like Earl Sheib, you're saying "I can build that system with 100,000 lines of code!" Calculating budget is now a function of hourly billing rate times productivity and estimated lines of code.

Here's an example: customer says "build me a new SuperWidget." Contractor looks thoughtful and says "We think SuperWidget should take 50,000 lines of code, since it's kind of like TurboSquirrel we built before. Our normal efficiency is 1 LOC per hour (seriously), and billing rate is $200 per hour. So budget is $10 million dollars. There's 2,000 man/hours per year, so we need 25 people and you can have it in one year." There's your budget, schedule, and estimate, all rolled into one. If one changes, the others have to change as well. SuperWidget is a large project, obviously.

That's seriously how it works. Oh sure, there's supposed to be more analysis to determine how similar the project will be to previous projects, but the number is still, at best, a guess. It's not like building a bridge: you can't estimate (very well) the amount of steel and concrete, simply because software is not bound to physical properties.

So how do you get this model to work? The "fudge factor" is in the productivity. You set your productivity so low that you know, absolutely, you won't miss the estimate. Why do you only produce 1 LOC per hour? StarOffice, considered one of the largest open source projects, is estimated around 9 million LOC. That's 4,500 man/years, using our previous calculation of 1 LOC per day. 4.5 years with 1,000 developers, or 45 years with 100 developers. Obviously, something else is going on. Estimates show that productivity can be around 10 times our estimate. But how do you get there? That's the topic for next time.

Thursday, February 19, 2009

The Blob Problem

No, that's not a typo of Blog. The Blob Problem goes hand in hand with scope creep. As a project grows, it grows from a well-defined box into a poorly defined blob. The box, and blob, correspond to the answers to the questions "What does it do?" and "How does it do it?" From the beginning, the box clearly defines the application's domain, and the volume inside the box (I'm already stretching the metaphor) is open and visible.

As the project grows, "dark areas" of the system appear, and the box becomes less transparent, more translucent. It grows out of its original area, and starts stretching to join other areas of interest. Rather blob-like. Over time, the function and scope of the original project has morphed into something largely different.

So what do you do? As always, Do The Simplest Thing That Could Possibly Work. So the answer is, "It depends." But do what you like. I'm personally fond of a wiki with some design documentation in it. The debate about whether UML fits into DTSTTCPW still rages on, but I like to use UML more like a sketch tool. I don't care if it's correct, or that all aspects of the design are analyzed. If it explains "X talks to Y," or "X does A, B, then C" better than words would, then it covers DTSTTCPW.

Your system is still a set of Legos. Once it becomes one large Lego, the internal shape needs to be described, so that someone can use it and/or work on it without a lot of extra effort. The amount of documentation that is needed is surprisingly low, and for good reason: too much documentation is worse than too little. By trying to plow through a mountainous tome of documentation, you're wasting your developer's time. But by creating, reviewing, and maintaining that stack, you're wasting everyone's time.

Monday, September 29, 2008

Where Does The Time Go?

When you start a project, don't you marvel at how efficient you are? How much you can get done in so little time? Why can't we always be that efficient? There are a number of reasons why efficiency drops off. Producing documentation (oh, you want someone else to help you, or you want someone else to use your project?), meetings (you need to work with these other people?), and bug fixes are the primary detractors from producing new functionality.

I've said before that Line of Code count is a horrible metric for measuring software development. It's like measuring hammer strokes in building a house, or the number of licks to get to the center of a Tootsie Pop. So I will be using percentage of time as a virtual metric. What percentage of time is spent in which task is a much more useful estimation than counting the number of nails used in a wall.

When you start a project, obviously most of your time is spent building new features. There may be some note-taking along the way to help you remember what you're doing, but really you're just trying to produce something that works, to solve your original problem. As you move along, there will be a equilibrium between new code, bug fixes, documentation, and meetings. You will wind up with a graph that looks like this:

Good amount of new growth, reasonable amount of bugs being fixed, not too much other stuff. Enough documentation to be useful, but not so much as to be a mountain. A few meetings to discuss what's going on and how it's going.

Compare to these two horror cases:

Given that there are only 24 hours in a day (please don't let my boss find that out...), for every additional chunk of time taken doing something else, that's time that can't be spent building software. So how do we maximize the time spent building software, and find the sustainable balance between too much documentation and too little? That's a topic for another day.

Monday, August 25, 2008

Are You Wasting Time?

No, I don't mean reading my blog or surfing the internet or watching youtube videos. I mean even when you are working, are you spending time doing repeatable tasks that could be automated, and are you doing tasks that do not contribute to the development of the product?

The first question is the easy one to ask, but can be a hard one to fix. For example, does your build require you to "do something" to go from the built baseline to a running executable to test or deploy? If not, what's holding it up? Is the build simply not setup to do that, or is some portion of your system not configured to do it? If it's the second case, that's a bigger problem, but it can still be overcome by working with your system administrator and addressing the holdup.

The second question is the bigger task, and will usually require buyoff from your program Higher-Ups. I see this problem as the corollary to the Agile Manifesto, which states (among other things) "Working software over comprehensive documentation." I like to sum that statement up simply "Does this help me do my job? Would this help someone else do my job?" If the answer is "No" don't do it. You can quickly see where that would be a problem with the Higher-Ups, and the "That's the Way We Do It" crowd.

Another corollary to the Agile Manifesto is "It's OK to Screw Up", meaning, do something, give it to the users, and they'll tell you what they like and don't like about it. But do it fast enough to have enough time to fix the problems, and add the new features to it. This can't be done in a rigid environment, where it takes weeks just to get a fix to the users, never mind the amount of time to develop a new feature. To do it, and do it fast, and get it right, actually takes less overhead than you'd think.

Friday, August 22, 2008

Building Blocks or Jigsaw Puzzle?

Is your system building blocks for a larger system, or jigsaw puzzle pieces to build the same system? A puzzle piece fits into one and only one spot. A building block will fit into one spot, but can fit into other spots as well. I also like to think of it as a "spiky vs. smooth" interface.

What make good building blocks? The obvious buzz-words "extensible, re-usable, decoupled, cohesive modules" apply, but it's easier too look at. How hard is it to add new data to the system, and how hard is it to make the system do something else? If the answer to any of those is "not easy", then you're building jigsaw puzzle pieces. Look at your method signatures. Do they perform specific functions on specific types of data? Those are puzzle pieces. Do they perform specific functions on generic data, or (better still) generic functions on generic data? Those are building blocks.

So what makes reusable, extensible data objects? Obvious answers are object inheritance and using a syntax like XML. But no matter how you do it, you need to identify similarities between data and only handle unique data in special cases. Abstraction of data into generic objects for modeling and design helps describe the mappings between similar data.

An example of what I'm talking about. How many times have you seen "The Bank Example?" But imagine that example built to handle one very specific type of currency and a couple specific use cases:
depositDollar
and
withdrawDollar.

Simple, straightforward, and almost totally non-reusable (without refactoring). The use cases would be diligently mapped to methods named
depositDollar(Dollar dollar)
and
Dollar withdrawDollar()

in class AccountAccess. Dollar, being a good data class, would contain all the useful metadata about your dollar: serial number, creation year, number of wrinkles, that sort of thing. On the server, the business logic would need to store the Dollar and withdraw the Dollar from the Account, and do all the associated checking that goes along with it. So now, how do we add a new data type or another operation? Yup, create a new use case and a new method. Repeat ad infinitum (or at least ad naseum).

Now we've got a system doing very similar things on very similar data, but it's hard to refactor because you're used to thinking about the specific types of data, that a Dollar is somehow different than a Quarter, or a Euro. The only thing that can pull us out of this mess is to re-evaluate the data, and more importantly, what the users want to do. They want to put money into their account and withdraw money from their account. Now, generically, we can use methods
deposit(Money amount)
and
Money withdraw(Money amount)

where Money is, of course, an interface that Dollar, Pound, Euro, etc. all implement.

Now, I realize that there will be have to be some "business logic" in the system, otherwise the system doesn't do anything unique. But the amount of uniqueness should be low, and decrease, not increase, as the system expands.

Wednesday, August 13, 2008

Branches, Refactoring, and Deprecation

One of the biggest hurdles in refactoring is answering the question "How do I merge changes into something that's no longer there?" I will attempt to lay out a useful strategy for addressing that problem.

In our example, we've got a class that has a number of methods. The number of methods continues to grow (and the amount of logic in those methods grows as well), to the point that the class is unmanageable. It's no longer clearly defining its task (if it ever did), and has wandered off into areas that are outside its scope. But, because of its importance to many areas of the system, and because of its poorly defined task, it has a lot of bugs in it. The bugs may be from improper implementation, or incomplete analysis. Either way, there's a lot going on in here.

The obvious answer is "Refactor it." However, that's not going to be easy, since there are still bugs that need to be fixed during and after refactoring. How do we deal with this? With a cunning use of deprecation and branching.

Step one is to create a branch for the refactoring. The syntax and details of doing this are different for every version control system out there, so I won't get into it. Now you've got an area set aside just for refactoring without any outside influence and are able to verify that you haven't broken anything, through automated or manual regression testing.

Step two is to deprecate the methods or classes that you've modified during refactoring. But don't remove them! That will keep a merge point for you to pick up an additional changes that happend during refactoring and testing, and inform other developers that the class is about to change and the current method is no longer supported. In addition, a beneficial practice when deprecating (I'll use Java as an example, since I'm most familiar with it) is to include the @see tag to point the other developers at the right method.

Step three is merge in any changes into your branch. This will be a two step process: merge the changes into the original spot, then apply them into the refactored locations. This may seem time consuming, but is really the only way to ensure that your branch is up to date before merging into the main line. Re-run any tests for the changes to make sure that you merged the changes correctly (at this point, the benefit of automated testing over manual testing should be more obvious). Now you can merge your branch into its parent.

Step four is only necessary if other people are working on their own branches as well. After such time has passed that all the branches have picked up the refactored class, you may delete the deprecated methods or classes. This additional wait allows the other branches to migrate to the new organization at their own pace. They will have to do the same step you did before merging: apply any changes they've made to the original method to the refactored one.

Tuesday, August 12, 2008

Development by Telephone Game

Does your development process have a lot in common with the Telephone Game (also called Chinese Whispers or Russian Scandal)? I mean, how many steps, and how many people, are involved in getting what the users want into their hands?

Do the users tell their management, who tell the prime contractor's systems engineers, who tell your system engineers, who tell you what to build? Then, do you give your build off to the integration team, who gives it to the maintenance team to give to the users? Let's see, that's seven steps between the users and the end product.

That's one key area where open-source does it right, because they have to. The user submits a bug or new feature to the developers. The developers fix it or build it and give it back. That's one step (OK, there's probably a couple internal steps for reviews, and testing, but I overlooked those in the first example, so I'm overlooking them in this example too). And funny enough, it creates a stable product that delivers the functionality the user wanted faster than "Doing it the right way."

Are you naive enough to think "But think of how good the product would be if they did it our way?" The reason open source works (and work fast) is that there aren't multiple steps between the user and his product. The best tools are used BY the developers (see GCC for example) during development. If there's uncertainty, the developer talks to the user. There's no "pushing issues up the chain" and waiting for resolution. There's just a need, and the fulfillment of that need.

So call it Agile, call it XP, call it whatever you want. But if you base your development around creating user functionality instead of data functionality or system functionality, you will produce more user-usable functionality faster, simply because everything you build addresses some user need.

Wednesday, July 23, 2008

Agile Architecture

Agile (or eXtreme Programming) development usually focuses on the development of the software itself, but it can (and should) be applied to all parts of the system. If the architecture of the system is determined before the first iteration and not open for refactoring or other redesign principles, then it falls into the category of BigDesignUpFront. To put the problem another way, how can the domain of the system be determined if what the system will do is not determined? And, if the system functionality is already determined, then there's no room to DoTheSimplestThingThatCouldPossiblyWork.

One argument against DoTheSimplestThingThatCouldPossiblyWork is that nothing complex (like a compiler) would be built. I disagree, simply because DTSTTCPW should only be applied to the domain as it is understood. So TheSimplestThing when you are getting started should be different than TheSimplestThing as your system becomes operational. TheSimplestThing when you are working in isolation on your personal computer becomes massively different from TheSimplestThing when you are providing a service on the Internet. As a ridiculous example, think about the problem of searching for a given word in your filesystem. Would you build Google to do it? No (or I sincerely hope not). You'd use find or perl or some other script language. Now, if you want to let anyone search for any word in any word on the Internet, now you've got to build Google, and all the system availability and performance benchmarks that go along with it.

The great thing about Web technologies is that they scale well, and don't require replacing to service greater demand or performance objectives. If one webserver can handle your own requests, then a cluster can handle thousands of them. But how do you refactor an architecture? Most people think of architecture as an inflexible part of the design. Nothing could be further from the truth. Writing POJOs (Plain Old Java Objects) leads quickly into EJBs (Enterprise Java Beans) in a J2EE container, or web services. Same goes for any web-based scripting language. Command line PHP can be easily hosted by Apache and serve the same data over the web.

The key point is that just as your software should be designed iteratively, so should the archtiecture. As the system requirements grow and change, the architecture should grow and change with it. Will there be additional software changes to handle the new architecture? Probably. Will the overall system be better suited to handle the system parameters? Yes. Otherwise, you will be trying to put a now square peg into a hole that's still round. It can be done, but is sure isn't TheSimplestThing.

Monday, July 21, 2008

The Network Is The Problem

The first goal of any architecture should address two questions: how does the user interact with the system, and what does the system do when the user can't connect. Obviously, the goal of the system is to provide as much connectivity as possible, but there will always be potential problems between the client and the first network connection. The system will need to address how to handle data that does not make it to the server (like, blogger, for example).

How the user interacts with the system will affect how the system handles the data when the network is down. Of primary importance is what to do with any data that would have been transmitted from client or server, and how to recover once connectivity is restored. For a time-critical system, an additional important decision must be addressed: what to do if the request isn't received in time. If the request can't be fulfilled, does it still have value?

By focusing on the user input, you can keep your project in scope. If you start looking at data, then you can start looking at all the places that data can go, and all the systems that need to interact to deal with the data. By focusing on what the user needs, rather than what the system requires, you can iteratively add more and more features, and rebuild the system as new capabilities are required for those features.

A trivial example: a user needs to know what his work assignments are. So he 1) logs into the system (define the scope of the user environment, address network connectivity - see, I told you it would be important), 2) requests his work (pre-defined data, manual query?) 3) displays the data (what kind of display is available?)

Now, once those decisions are addressed, they don't need to be addressed for the user to create a new work assignment. Rather than attempting to figure out all of the system up front, the system can provide a needed functionality quickly. Once a decision is made, it can be applied to all other functionality in the system.

Monday, July 14, 2008

Agility 101: Refactor Your Build

One of the problems with using an adjective like "agile" or "extreme" (ugh) is that it implies a degree. As in, "A mouse is more agile than an elephant." However, you could say the elephant is more agile than, say, an aircraft carrier. So, off the bat, agile seems open for interpretation, whereas Rational Unified Process or Waterfall have well defined steps that you either are doing or aren't doing. This is wrong. Agile has a very well defined set of processes, but since the term seems to lend itself to statements like "We need to add some agility," it is working at a disadvantage.

The primary goal of agile development is continuous, sustainable development. If this is not possible in your current environment, ask yourself why not? A primary area for improvement is the build/deploy/test environment cycle. If it takes hours to build, deploy, and coordinate testing, then you've reduced the amount of time available to do anything useful. Some important points to consider:

How many steps does it take to go from build to running system? If it's very many, re-examine all the functionality that Ant provides.

Do multiple versions of the system potentially interfere with each other? If so, figure out a way to run multiple versions in isolation. As an additional point, can multiple versions of the same software co-exist on the same server or in the same environment? If I need to care what you are doing, that's going to take time to rectify.

Can I build and test multiple branches without interfering with a previous build? To reiterate the previous point, if it takes time to reconfigure and set up to test a fix or enhancement, that's time that can't be used for coding.

Given the capabilities of a tool like Ant, there's no reason that a complete, executing system cannot be produced out of every build.

Tuesday, June 24, 2008

Improve Product, Not Process

Believing that process improvement will improve the product is like believing spending more time looking at the map before taking a trip will prevent you from running into a traffic jam along the way. Process improvement creates paint-by-numbers, it creates fast food. Which is fine if that's your goal, but don't expect anything other than paint-by-numbers or fast food. Otherwise, we're treading into Einstein's definition of insanity "Doing the same thing over and over and expecting different results."

If you feel you must improve your process, plan your process for change. There are too many uncontrollable variables to have any expectation of a successful delivery with a single waterfall method. So plan your process for more frequent changes, more frequent reviews, and more frequent releases. Hmm, this sounds like agile (without a capital A).

Tuesday, June 17, 2008

Just Because You Can, Doesn't Mean You Should

A note on the Agile adage "Do the Simplest Thing That Could Possibly Work. (DTSTTCPW)" DTSTTCPW does not mean "Do the Easiest Thing," or put another way "Just Because You Can, Doesn't Mean You Should." (Hey look, a title!) There's a file line between those two ideas, because the Simplest Thing to one person isn't necessarily the Simplest Thing for everyone else involved. Reducto ad absurdum example: suppose someone keeps the requirements in a Word doc on their desktop. Anyone who wants to know what they have to do has to go look at that one person's desktop. Now that's surely the Simplest Thing in this case, but that is definitely not the Simplest Thing for everyone.

So, as usual, we have a problem. How do we figure out the Simplest Thing when it comes to organization and communication? Here's some guidelines:

if you have to search for it, it's not in the right place
if you have to use different tools, you don't have the right tool
if you have to ask, it's not documented well enough
if you have to change the process to get something done, the process needs to change

The first point will stop the practice of "Put it on the data server" in a unnavigable directory structure that combines software, business development, schedule, test, birthday lists, and everything else under the sun. Do you have a number of bookmarks and shortcuts to help you navigate your data drive? So do I. This problem can be traced back to the problem that the project is not focused on the Product, but more on everything else that goes into the Product. If the data is not related to the person who has to wade through it, get rid of it (or at least put it somewhere else). Sourceforge once again comes up as the model. All the documentation related to the Product is stored there. Want to use it? Read the documentation. Want to see the bugs? Look in the tracker. Want to help? Download the source for Subversion or CVS. It's a very simple idea that's very easy to overlook: The Product is the Project (hopefully Scott McNealy will forgive me).

A documentation system based on a file system just isn't effective anymore. Sure, it works. Much like any outdated technology still works, there's just a much better way to do it.

Tuesday, June 10, 2008

The Simpsons as a Software Model

Like most things in life, The Simpsons show us the way. Or, rather in this case, the way NOT to do something. In "Oh Brother, Where Art Thou?" Homer gets to request all the features he wants in his dream car. In effect, he gets to design the car himself. You can see the result on the left. That's like what happens if our customers get all the features they ask for.

Our end product ends up looking like that. Lots of features "stuck on" with very little concern for the overall design. Sure, it will work, and come out as a finished product, but the true art of software comes from taking all those features and still producing a cohesive, solid design. But, like most designs that are "customer-driven," it is way over budget (at least it's not behind schedule, as well). Now, if you're in a market where you can continue to charge the customer to make up the budget, that's a feature, and probably a goal. However, in the real world (and the Simpson's world as well), an ugly, poorly designed product won't sell.

So how do we get from Homer's vision (which, incidentally, he was quite pleased with) to a mass-market friendly one including all the features the customer requested? One simple step that seems to be largely overlooked is "Saying No." I can't believe I've even got to put this down, but if the customer makes an outrageous request, saying no (and explaining what we should do instead) will cut the largest potential budget/schedule over-runners. For example, I'm currently supporting a custom data synchronization engine for my current project. Now, we are using a database that supports replication, but it is not available. If we'd just said "No" (and for the record I said "No" but my managers didn't care, because they didn't want to say "No" to our customer), we would be able to use the built-in replication. But we didn't. C'est la vie.

Thursday, June 05, 2008

A Software Fable

Gather 'round, kids. It's story time. There's so few fables written nowadays, and few of them would apply to software (if you read between the lines). So here goes.

Once upon a time, there were two groups of engineers out for a walk. They came upon a creek, and, being engineers, decided to dam the creek. The first group set to work immediately measuring the depth and width of the creek, calculating flow rates and how high the water would rise once dammed. The second group looked at the creek and started throwing rocks into it. The first group, having taken all their measurements, set off to carve a rock big enough to block the creek. They left the second group still throwing rocks into the creek.

Time passed, and the second group came back with their huge rock. But unfortunately for them, the rain had turned the creek into a raging river and their rock was still not big enough! Or it would have been a raging river if it was flowing. To their surprise, they saw a dam built out of the rocks that the second group had been throwing! The first group was amazed that something so big had been build by so many little rocks. They went and asked the second group how they did it. The second group said "Some of us looked for more rocks to put in a pile, and the rest of us threw the rocks from the pile onto the dam. So while you were measuring and calculating, we were already solving the problem. If anything goes wrong with the dam, we just need to throw more rocks. If it rains more, we just need to throw more rocks. You need to carve a whole new rock."

Many morals apply to this story: do the simplest thing that could possibly work, don't over-engineer the solution. But the one I like is this: if you leave engineers in the woods for too long, this kind of thing is bound to happen.

Wednesday, May 14, 2008

Setting the Bar Low

I work in a company that is primarily concerned with doing just enough to get the next contract. That is a demotivation for any employee who wants to keep up with technology, or branch out into a new area of development. The problem with that mindset is that it is difficult to convince anyone that any change is a good idea. You will be met with answers such as "It worked fine (we delivered something on time), why change?" and more simply "That's the way we do it."

That's a perfectly fine mindset if you are working on an assembly line. But, as I've stated repeatedly, treating software as simply production is to do disservice to it. More importantly, the company will lose their top developers who would quickly tire of doing the same thing over and over. But this is not change just for the sake of change. Rather, it is change for product improvement. Doing it better, and/or faster. Take the old vi argument. Sure, you can develop software in vi, but I can do it much faster with Netbeans or Eclipse. But the metrics are already skewed toward the development rate with vi, so I've now got free time on my hands. What do to with it? I'd like to work on something else, investigate something new, or otherwise improve the product in development, deployment, organization, or maintenance. But I'm told I can't, because "There's no budget" or "You're not authorized." So we keep on happily cranking out buggy whips (no pun intended).

Einstein's old adage "Insanity is doing the same thing over and over and expecting different results" is well understood in a large company. The side of it that's not noticed is the side that wants to keep talented developers, but doesn't realize that we're really just doing the same thing over and over. What's really needed is to totally throw out the existing schedule and really challenge the developers. Don't give me three months to do a task, give me three weeks, or three days. You will get a radically different solution, simply by being prevented from solving the problem the same old way.

Monday, April 21, 2008

K.I.S.S Applies to Process As Well

K.I.S.S. (Keep It Simple, Stupid, not the 70's band) is a tenet of agile software development. But it should also apply to all facets of development and not be limited to just writing code. If you've got to go to one server to get requirements, another one to view design documents, a third to build your code, and a fourth to view problem reports, odds are one or more of those systems will be out of date. "One server to rule them all" is a much more maintainable approach and ultimately a more useful approach for everyone involved.

There are a number of great sites organized like this, but SourceForge is probably the largest. One project page contains all the necessary elements for the lifecycle of the project: code, documentation, support forums, bug/feature tracking, and release downloads. A user, a developer, and a manager would all access the same site. This is, as always, by necessity. In a distributed development environment, every developer has to have all the necessary information available to them.

Tuesday, April 01, 2008

The Problem with the Office

No, not Microsoft Office. Not the TV show. I mean the place where you work. You know, your cube. The problem isn't exactly with the office, but rather with having office-mates around. Specifically, the problem is the availability of those people. You know, Bob the database guy, Phil the systems guy, or Jill the tester. If you have a question, you go ask them. If they have a question, they come ask you. What's the problem with that, you ask? The problem is that the resolution to that question is only known to you two. The problem is that if anyone else wants to know what the answer is, they have to ask one of you. That's job security, to some people.

When an open-source developer has a question, or has to make a decision, there's usually no one there to ask. So the developer has to go the project mailing list, or forum, or website, and ask the question to the group. Then anyone else can see the answer. If anyone else thinks the question is very important, it can be added to the documentation, FAQ, wiki, or whatever. That way, the documentation represents what anyone really needs to know about the project.

This problem is reflected in the "learning curve" for business projects. Some companies try to overcome this shortcoming by mentoring, or shadowing, or other methods for effectively "picking someone's brain." But all they are doing in ingraining the mindset that some person knows the answer and will give it to you if you ask. This is a dangerous process, because it creates many single points of failure. What if Bob the database guy gets hit by a bus? Who else knows how to run the database? If he'd written it down, then someone else will be able to take over. Overcoming the job security mentality is a tough task. But, assuming that no one will work at the same job for their whole career, a repository of information is absolutely necessary.

Your coworkers may be great people, but stick to talking to them about non-work issues. If it's about the project, create a record of it. If you really like process, create a process to review all the wiki entries, problem reports, and mailing list entries for review for possible inclusion into the formal documentation. But, above all else, write it down!

Monday, March 10, 2008

Government Software, An Oxymoron

Or "Software by Earl Scheib: I Can Build that System for $49.99"
I can't wait for Google to start a Federal Systems group. Until they (or someone else with their deep pockets and development methodology) get involved, the rest of us will continue with status quo. It is unfortunately not in our company's best interests to improve the way we develop software. In fact, poor software process is encouraged because it allows budget and schedule over-runs, otherwise known as Cost Plus and Follow-On Work.

Both sides are at fault for this, but ultimately the blame has to fall on the developer side. The client is only asking for what they want, as they always do. The problem lies with the development staff not explaining that they are making unreasonable requests. Furthermore, allowing the client to determine the technology to create the system will most likely not utilize the most optimal architecture for the problem. Which leads to more Follow-On Work. Perfect: the client gets what they want: a convoluted, inefficient system, and we get more money to fix our mistakes later!

Honestly, I can't wait for Google's first meeting with govvies where they say "You'll get your software when it's ready" and watch multiple General's heads explode. I don't have any idea if Google is crazy enough to enter this arena, but if they do, they will dominate it. Why? Because the existing competition is some combination of lazy and inept. Or, we are happy doing whatever crazy job the customer thinks up next.

I can't believe that the Form-A-Committee-to-Investigate-A-Commission world of the government works as closely as they do with the Long-Haired-Free-Thought world of software development. I know they've had in-house staffs before outsourcing the work, and I think that's pretty good evidence that they don't understand how to do it. But rather than say "Build me my system and give it to me when it's ready" they still want to have control over the process, but have success at the end. Remember Einstein's definition on insanity?

So why Google? They've got the money to make it work, but more importantly they've got the influence to break the model. Get government's hand off the software except deciding what they want and making sure it works, and you get a better product. Come to think of it, "Get government's hands off... and you get a better product" is a more succinct explanation anyway.

Monday, March 03, 2008

Keep It Small

One of the largest (no pun intended) problems with building government software is too much knowledge at project inception. For the most part, we are rebuilding or adding onto an existing system. Therefore, we think we know where we want the end product will be, and think we know all the problems we will encounter along the way. As Murphy's adage states, that is a recipe for disaster, or at least schedule slip. By not building the system as a progression, but rather taking it as a monolithic task, the environment for innovation is decreased (to put it as nicely as possible). Like a used car, if the system works perfectly now, why are we rebuilding it?

The fallacy that "We built this before, so we know what it does" should never be applied to a system redesign. The problem with it is that it does not address all the other things done under the covers to create the end result. The existing system should not be used as a model for the new one, but rather the functionality the existing system attempts to provide should be recreated in the new one. Or more simply put: look at what it does, not how it does it.

This problem results from the XP concept of Big Design Up Front. Simply put, it's taking too big of a bite that results in a system that's too complex to be understood, tested or extended. Yes, it sounds like the paradox "Can God Create a Rock So Big He Can't Move It?" and, when applied to software, quickly becomes "Can People Create a System So Big They Can't Understand It?" I'm sure you'll agree the answer is yes, and very quickly. In fact, the greater task is preventing that from happening.

So how do we resolve the problem of Big Design Up Front? Stop thinking about it. Take the known pieces of the system, design them such that they can produce query-able, re-usable output (XML Web Services or EJBs in this era), and build it. Define the architecture and the data flows, but keep your hands off the system components. There's enough to do to define or redefine what the users are doing, or better, what they want to do.

I propose the Rule of One: if the problem can't be defined, described, and agreed upon in one hour, it is too complex. Most people would groan at a meeting that is scheduled for more than one hour. Therefore, one hour is quite enough time to organize an approach and discuss the issues with that approach. That quickly leads into The Simplest Thing That Could Possibly Work . Additional issues can be addressed in other iterations, but to get the ball rolling requires a quick, basic action.