Monday, August 10, 2009

Rapid Java Webservice Prototyping with Hyperjaxb (Part 2)

We resume the HyperJAXB example with some business logic.

Step 4: Let’s start with the basics: save, delete and query. We’ll leverage Hiberate’s ability to perform a saveOrUpdate, so we don’t need separate methods for both. Create a class, PurchaseOrderPersistence, and initialize Hibernate. I chose to use a singleton, just to make sure that the access will be atomic. So I’ve got two methods:

public static PurchaseOrderPersistence getInstance() {
if (instance == null) {
instance = new PurchaseOrderPersistence();
}
return instance;
}

private PurchaseOrderPersistence() {
persistenceProperties = new Properties();
InputStream is = null;
try {
System.out.println("loading properties");
is = PurchaseOrderPersistence.class.getClassLoader()
.getResourceAsStream("persistence.properties");
System.out.println(PurchaseOrderPersistence.class.getClassLoader().getResource("persistence.properties"));
System.out.println("is: " + is.toString());
persistenceProperties.load(is);
System.out.println("props: " + persistenceProperties.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (is != null) {
try {
is.close();
} catch (IOException ignored) {
}
}
}
entityManagerFactory = Persistence.createEntityManagerFactory(
" PurchaseOrder ", persistenceProperties);
}

What it’s doing is looking for the persistence.properties file and passing it to the EntityManagerFactory along with the persistence unit name (PurchaseOrder) defined in the build. Remember that we created persistence.properties in step 3 to define our Hibernate connection properties. In the example, we used JNDI to bind to the datastore, but you could use a JDBC URL as well. Now that the system knows what to do with the beans, we can start creating methods. Basic getOrder method:

public PurchaseOrderType getOrder(long id) {
PurchaseOrderType order = null;
EntityManager em = null;
try {
em = entityManagerFactory.createEntityManager();
order = em.find(PurchaseOrderType.class, id);
} catch (Throwable t) {
t.printStackTrace();
} finally {
if (em != null && em.isOpen()) {
try {
//em.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
}
return order;
}

Using EntityManager, we are able to lookup the requested object by ID without needing any SQL. Repeat as necessary for update and delete methods.

Step 5: Create the web service interface. Create a class to contain the web methods and use the WS annotations to declare the class as a web service. It should look like this:

@WebService(serviceName = "PurchaseOrderWS")
public class PurchaseOrderWS {

private static ObjectFactory of = new ObjectFactory();

@WebMethod
public PurchaseOrderType getOrder(long orderID) {
PurchaseOrderType task = null;
try {
PurchaseOrderPersistence pop = PurchaseOrderPersistence
.getInstance();
task = pop.getOrder(orderID);
if (task != null) {
System.out.println("getTask: " + task.toString());
}
} catch (Throwable t) {
t.printStackTrace();
}
return task;
}


This creates a web service named PurchaseOrderWS, and web method getOrder that returns a PurchaseOrderType. Keep in mind this is all without doing anything to the data object beyond defining them in the XSD. It's not 100% necessary to have the web methods in a separate class from the persistence methods, since there's a 1 to 1 mapping between them, but it's good practice to allow some flexibility in the design.

Step 6: Package and deploy the web service. Create a web.xml deployment descriptor (is this necessary with annotations?)

<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<display-name>
PurchaseOrderWS</display-name>
<servlet>
<servlet-name>PurchaseOrderWS</servlet-name>
<servlet-class>com.jon.purchaseorder.PurchaseOrderWS</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>PurchaseOrderWS</servlet-name>
<url-pattern>/PurchaseOrderWS</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
<welcome-file>index.htm</welcome-file>
<welcome-file>index.jsp</welcome-file>
<welcome-file>default.html</welcome-file>
<welcome-file>default.htm</welcome-file>
<welcome-file>default.jsp</welcome-file>
</welcome-file-list>
</web-app>

This defines the mapping of the web service to the implementation class. Package a war file using ant like this:

<!-- copy ws-related stuff -->
<copy todir="${basedir}/target/classes">
<fileset dir="${basedir}/lib">
<include name="runtime-0.4.1.5.jar" />
<include name="commons-lang-2.1.jar" />
<include name="hyperjaxb*.jar" />
</fileset>
</copy>
<!-- create war file-->
<jar destfile="${basedir}/target/classes/generated-classes.jar">
<fileset dir="${basedir}/target/classes">
<include name="**/*.class" />
</fileset>
<fileset dir="${basedir}/resources">
<include name="*" />
</fileset>
</jar>
<war destfile="${basedir}/target/PurchaseOrderWS.war"
webxml="${basedir}/resources/web.xml">
<lib dir="${basedir}/target/classes">
<include name="*.jar" />
</lib>
<metainf dir="${basedir}/target/generated-sources/xjc/META-INF">
<include name="*" />
</metainf>
</war>

This will package the war file using the web.xml from above, including the META-INF directory generated by Hibernate containing the persistence.xml with the object-relational mapping file. It also includes a jar file of the generated classes, since they are compiled into a separate directory from the implementation classes. It also includes the necessary hibernate/hyperjax jars in WEB-INF/lib. Copy the resulting .war file into your JBoss/server/deploy directory and start it up. If all has gone well, you should be able to navigate to the WSDL for your web service. If you invoked the service, you would persist your data in the remote database without writing any SQL.

Wednesday, August 05, 2009

Rapid Java Webservice Prototyping with Hyperjaxb (Part 1)

Something different: a hands-on example of integrating a number of FOSS/COTS tools into one useful suite for rapid prototyping.

Prerequisites:
JBoss 4.2.3
Hyperjaxb 0.5.3
Oracle 9i (although the database doesn’t really matter)
JDK 1.6.

Rapid prototyping with Hyperjaxb
This assumes JBoss 4.2.3, Hyperjaxb 0.5.3, Oracle 9i (although the database doesn’t really matter) and JDK 1.6.
Foreword: Hyperjaxb provides an object-relational mapping for JAXB objects. That means that a JAXB-compliant bean can be mapped to database persistence. Leveraging JAX-B and Hibernate (hence the name), it can take a schema and generate JAXB and Hibernate annotations in the generated beans. This feature allows for very rapid Web Service development, since the beans that your service uses are the exact same objects as you are saving into the database, and are generated with little more effort than writing the schema.

Step 1. Create a schema. For our example, we will use the well-known PurchaseOrder xsd from
http://www.w3.org/TR/xmlschema-0/#POSchema. Since it’s short, we’ll just include it here:


<xsd:schema xsd="http://www.w3.org/2001/XMLSchema">

<xsd:annotation>
<xsd:documentation lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>

<xsd:element name="purchaseOrder" type="PurchaseOrderType">

<xsd:element name="comment" type="xsd:string">

<xsd:complextype name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress">
<xsd:element name="billTo" type="USAddress">
<xsd:element ref="comment" minoccurs="0">
<xsd:element name="items" type="Items">
</xsd:element>
<xsd:attribute name="orderDate" type="xsd:date">
</xsd:attribute>

<xsd:complextype name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string">
<xsd:element name="street" type="xsd:string">
<xsd:element name="city" type="xsd:string">
<xsd:element name="state" type="xsd:string">
<xsd:element name="zip" type="xsd:decimal">
</xsd:element>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US">
</xsd:attribute>

<xsd:complextype name="Items">
<xsd:sequence>
<xsd:element name="item" minoccurs="0" maxoccurs="unbounded">
<xsd:complextype>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string">
<xsd:element name="quantity">
<xsd:simpletype>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxexclusive value="100">
</xsd:maxexclusive>
</xsd:restriction>
</xsd:simpletype>
<xsd:element name="USPrice" type="xsd:decimal">
<xsd:element ref="comment" minoccurs="0">
<xsd:element name="shipDate" type="xsd:date" minoccurs="0">
</xsd:element>
<xsd:attribute name="partNum" type="SKU" use="required">
</xsd:attribute>
</xsd:element>
</xsd:element>
</xsd:element>

<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpletype name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}">
</xsd:pattern>
</xsd:restriction>
</xsd:schema>


This schema defines a purchaseOrder object as contaiing shipping information (shipTo), billing information (billTo), and the ordered item information (items).

Step 2. Compile the schema. The relevant portion of the build.xml looks like this:


<xjc destdir="${basedir}/target/generated-sources/xjc" extension="true">
<arg line=" -Xhyperjaxb3-ejb
-Xhyperjaxb3-ejb-persistenceUnitName=PurchaseOrder -Xhyperjaxb3-ejb-roundtripTestClassName=RoundtripTest
-Xequals
-XhashCode
-XtoString">
<binding dir="schemas">
<include name="jaxbBinding.xml">
</include>
<schema dir="schemas">
<include name="po.xsd">
</include>
<classpath>
<fileset dir="${basedir}/lib">
<include name="*.jar">
</include>
</fileset>
</classpath>
</schema>
</binding>
</arg>
</xjc>


Which tells xjc (jaxb compiler) to allow extension, use hyperjaxb-ejb as an extension, and compile po.xsd. Binding allows schema overrides to be custom-defined. This creates 4 classes: Items.java, ObjectFactory.java, PurchaseOrderType.java and USAddress.java Let’s look at PurchaseOrderType.java. You’ll notice that the JAXB annotations are present on the class variables. In addition, the JPA/Hibernate annotations are present on the accessor methods. Also, note that a field not defined in the schema, hjid, has been added. This will serve as the primary key for the table since one was not defined in the schema. Later, we’ll show you how to handle this.
For the field shipTo, the declaration looks like this:

@XmlElement(required = true)
protected generated.USAddress shipTo;

and the accessor methods look like this:
@ManyToOne(targetEntity = generated.USAddress.class, cascade = {
CascadeType.ALL
})
@JoinColumn(name = "SHIPTO_PURCHASEORDERTYPE_ID")
public generated.USAddress getShipTo() {
return shipTo;
}

And

public void setShipTo(generated.USAddress value) {
this.shipTo = value;
}

Note that shipTo declares a ManyToOne relation, which indicates that many shipTo elements will reference this one PurchaseOrder, and will contain a foreign key to the primary key in the PurchaseOrder (the hjid).

Step 3. Hibernate configuration. Obviously, this will vary depending on your environment and your intended target. In this case, we will be creating a web service, served by JBoss, and connected to an Oracle database. In any case, you will need a persistence.properties file to define connection properties. For example:


hibernate.dialect=org.hibernate.dialect.Oracle9iDialect
hibernate.connection.driver_class=oracle.jdbc.driver.OracleDriver
hibernate.connection.username= user
hibernate.connection.password=password
hibernate.hbm2ddl.auto=create-drop
hibernate.cache.provider_class=org.hibernate.cache.HashtableCacheProvider
hibernate.jdbc.batch_size=0
hibernate.connection.datasource=/OracleDS


This tells hibernate that it will connect to an Oracle9i database using the Oracle driver, connected via JNDI to /OracleDS. Configuration of OracleDS can be found in the JBoss documentation. For our example, we’ll assume it is set up and working. Note that Hibernate allows us to replace the underlying datastore without requiring any change to the business logic. That means the dialect parameter would be the only change required to migrate to another DBMS.

That's enough for today. The next entry will focus on writing the business logic to use the generated beans in a web service and deploying the web service into the application server.

Monday, March 23, 2009

Evolutionary Architecture

Agile Architecture, Round 2. As I (and others) have said, "Do The Simplest Thing That Could Possibly Work" applies to more than your software. So today, we'll look at the common growth process for a system, and some of the constraints that limit each phase. Our example is a web-based, database-driven application, but can be applied to any system with persistent storage. (Thanks to mxGraph for images).



Step 1 is the absolute basic setup: everything running on one server. One webserver, one database backend. I suppose Step Zero would be no database, just files or something. But even that trivial example is enough to get us off the ground. No failover, no backup, no load handling? No problem! Now, I would not advise using this type of system in a live environment, but for prototyping and identifying performance bottlenecks, it's just fine. Get something up and running, then see how it behaves. The ultimate limit of the system with this configuration is limited by how big of a box you can get your hands on. But most likely, one giant enterprise server would not work as well as step 2, for reasons that will be explained.

This also would be a good place to mention the value of monitoring and logging at this level. In order to identify bottlenecks, you need to be able to see how your system responds to load.



Step 2 is a very common setup. It's not that much different from step 1, except that the database and webserver are on physically separate machines. This is the first step toward separating into functionality, except the functionality is as coarse-grained as it can be at this point. You could just as easily add a clone of the first machine, but now you've got two sets of processes to manage, which we will address in the next iteration. This setup allows us to tune each machine for its specified task: application or database access.

By introducing a second machine, we've now introduced a synchronization issue between them. If one machine goes down, the second machine needs to be able to address that somehow. Fail the request? Queue and retry? The answer is "It depends" on the situation being addressed. Serving webpages, the user may wish to submit their query again. Storing credit card requests may need to be more robust. But we can handle the problem with the next step:



Step 3 could theoretically be scaled out to hundreds or thousands of servers, and Step 4 is really a subdivision of Step 3. Notice that our number of servers is increasing exponentially. We started at 1, then added a second. Now we're managing (at least) 4. The cluster implementation can be as simple or as complicated as you make it - J2ee container-managed cluster or simple proxy software. The important things with this step are:
  • you can now scale as many servers as you may need, quickly and simply.
  • you're removed almost all failures from impacting availability.
  • you're still constrained by the database master server.
This step is assuming that you're still dealing with fairly homologous data at this point: all the data coming and going from the servers can be dealt with similarly. This type of layout is pretty common among the Less Than Huge websites, that is, excepting the Ebay's, Facebook's, and Google's of the world. Once you get that huge, you will probably need a unique approach to handle your traffic anyway. But this will get you to 99% of the way there. At this point, you've already addressed pushing updates to your servers and handling inconsistency between them (in the previous step). But let's say your monitoring is indicating that you're either writing too much data for one write-only database to keep up with. Or let's say you identified a major performance bottleneck in the system. That's where step 4 comes into play:



We're now, finally, at a point that could be considered "architecture." Each type of process is located on its own server, connecting to a database that contains the data that it needs to perform that function. This would be the point where de-normalization may come into play. By removing the need for connecting the databases, the minor performance/space hit taken by denormalization would be offset by the lack of interconnection between the databases. Also at this point, there may be additional vertical stacking, separating the presentation layer from specific data-processing. Now we're into the classic 3 (or n)-tier model, but the internals of those layers can scale as large as we want (within some physical limits).

So to sum up, your architecture should "grow" along with your application. It should provide a framework to allow your application to handle growth, not restrict you to grow in a specific path.

Tuesday, March 10, 2009

Schedule, LOC, and Budget? You're Lying

Software engineering is an imprecise science. Anyone who tells you differently is either not working very hard, or doesn't understand that fact (or both). Actually, calling it engineering is a bit of a misnomer itself, since the engineering work in building systems is as different as engineering a submarine vs. a bridge. Engineering involves creating repeatable systems of common features. That is, there are a set of rules for building sets of bridges that have little in common with the rules for building submarines.

The problem with defining software as engineering lies in the identification of the dividing line between engineering and manufacturing. Determining the amount of effort (time, code, and people) for software is much less defined than traditional "peoplespace" engineering because of the fluidity of the tools that are used. Imagine writing a proposal to build the first transcontinental railroad. 50 miles per year, thousands of workers. Now image the proposal today: much more production, much less workers. Computers allow the creation of these monstrously efficient production mechanisms. Hence the statement "hardware is cheap, people are expensive."

Looking at schedules, we see there are two types of release schedules: time-constrained (release on a certain date, like Ubuntu), or effort-constrained (all these features are in release X, like, say, Apache Httpd). Time-constrained releases on a set date, with a variable number of features. Effort-constrained delivers when the set features are done. Neither schedule mechanism has any concern with people or time. So either you release on a scheduled date, with whatever is done at that time, or you release when everything is done, regardless of date.

It would be silly to create a release schedule that released every 10,000 lines of code, but that's what our snakeoil salesman are proposing. Here's how it works: a budget is calculated based on the estimated lines of code, times the number of man/hours that would take to create, based on historical estimates of productivity. So, like Earl Sheib, you're saying "I can build that system with 100,000 lines of code!" Calculating budget is now a function of hourly billing rate times productivity and estimated lines of code.

Here's an example: customer says "build me a new SuperWidget." Contractor looks thoughtful and says "We think SuperWidget should take 50,000 lines of code, since it's kind of like TurboSquirrel we built before. Our normal efficiency is 1 LOC per hour (seriously), and billing rate is $200 per hour. So budget is $10 million dollars. There's 2,000 man/hours per year, so we need 25 people and you can have it in one year." There's your budget, schedule, and estimate, all rolled into one. If one changes, the others have to change as well. SuperWidget is a large project, obviously.

That's seriously how it works. Oh sure, there's supposed to be more analysis to determine how similar the project will be to previous projects, but the number is still, at best, a guess. It's not like building a bridge: you can't estimate (very well) the amount of steel and concrete, simply because software is not bound to physical properties.

So how do you get this model to work? The "fudge factor" is in the productivity. You set your productivity so low that you know, absolutely, you won't miss the estimate. Why do you only produce 1 LOC per hour? StarOffice, considered one of the largest open source projects, is estimated around 9 million LOC. That's 4,500 man/years, using our previous calculation of 1 LOC per day. 4.5 years with 1,000 developers, or 45 years with 100 developers. Obviously, something else is going on. Estimates show that productivity can be around 10 times our estimate. But how do you get there? That's the topic for next time.

Friday, March 06, 2009

Score Your Work!

Today's topic will be a quiz: how much does your employer suck? Total up the points at the end (javascript if I feel nice).

1. Cafeteria - Does you company:
offer tasty, fresh food upon request for reasonable price, including free? (1 point)
offer mediocre food at their convenience? (0 points)
employ LunchLady Doris? (-1 point)

2. Security - Does your company:
treat you like an adult, and keep physical security to a minimum (badge readers, etc)? (1 point)
Have some sort of obviously CYA policy in place involving security guards? (0 points)
check your badge at the gate, again at the building, then again at the entrance to your office? (-1 point)

3. Work Environment - Does your company:
Have modern, clean, up-to-date, large, well-lit work areas? (1 point)
Have old, dirty, out-of-date, small, dark work areas? (0 points)
Have all of the previous and have on-going construction so it sounds like you're working on a major street at rush hour? (-1 point)

4. Equipment - Does your company:
provide you with what you need to do your job, and provide an easy mechanism to acquire it? (1 point)
begrudgingly provide the minimum of what you need, but make the process so tedious that it's usually not worth the effort? (0 points)
prefer to maintain a death-grip over the infrastructure, in order to maintain their need to exist?(-1 point)

5. Management - Does your company:
have a reasonable number of managers, who are willing and capable of working with you to effect change in the way the company does business? (1 point)
have too many managers, some of whom have no noticeable effect on productivity? (0 points)
have so many managers that they create a totally separate reporting chain to separate work and labor-related issues? (-1 point)

More to come!

Thursday, February 19, 2009

The Blob Problem

No, that's not a typo of Blog. The Blob Problem goes hand in hand with scope creep. As a project grows, it grows from a well-defined box into a poorly defined blob. The box, and blob, correspond to the answers to the questions "What does it do?" and "How does it do it?" From the beginning, the box clearly defines the application's domain, and the volume inside the box (I'm already stretching the metaphor) is open and visible.

As the project grows, "dark areas" of the system appear, and the box becomes less transparent, more translucent. It grows out of its original area, and starts stretching to join other areas of interest. Rather blob-like. Over time, the function and scope of the original project has morphed into something largely different.

So what do you do? As always, Do The Simplest Thing That Could Possibly Work. So the answer is, "It depends." But do what you like. I'm personally fond of a wiki with some design documentation in it. The debate about whether UML fits into DTSTTCPW still rages on, but I like to use UML more like a sketch tool. I don't care if it's correct, or that all aspects of the design are analyzed. If it explains "X talks to Y," or "X does A, B, then C" better than words would, then it covers DTSTTCPW.

Your system is still a set of Legos. Once it becomes one large Lego, the internal shape needs to be described, so that someone can use it and/or work on it without a lot of extra effort. The amount of documentation that is needed is surprisingly low, and for good reason: too much documentation is worse than too little. By trying to plow through a mountainous tome of documentation, you're wasting your developer's time. But by creating, reviewing, and maintaining that stack, you're wasting everyone's time.

Tuesday, December 30, 2008

PHP for Java Developers

After years (and years) of working solely with Java, I've started learning some basic PHP. Coming from a strongly-typed, object-oriented background, it has been quite an interesting adventure. First off, I'll say that with an IDE like Eclipse, the programming language is almost irrelevant, once you understand the basic syntax for control structures and declarations. Once over that hurdle, it really becomes a matter of finding the right functions in the right libraries to do what you need to do.

I don't intend to get into which language is better. It's like arguing whether a hammer is a better tool than a saw. They fill different niches, and the bottom line is it's always better to have more tools in the drawer. But there are a few things that will catch a Java developer, and those are what I want to mention.

Global Scope and Declaration
PHP lets you do something like this:

$textlocation = $linelocation + 5;

Simple, right? Well, the gotcha is if you forgot to declare $linelocation, it will try to figure out what you meant. In this case, $linelocation will be an integer with value 0. In Java, that wouldn't compile, since there's no type associated with $linelocation, and no value assigned to it.

On the same topic, global variables have to be declared to be global.
$var = 5;
fuction foo()
{
echo "var: $var\n";
}

Would not output 5 as you might expect from
public class Bar()
private int var = 5;
public void foo()
{
System.out.println("var: " + var);
}

The reason is again, the dynamic variables. unless $var is declared as "global $var;" inside of foo, PHP will assume you don't want the global variable.

Now, the good side of dynamic variables. PHP will also allow you to do things like
$data = array ("date" => "12-29-2008", "name" => "Winter Park",
"min temp" => 3);
all in the same array. First element is a date, second element is a string, and third element is an integer. Now you can argue that Java deliberately doesn't allow that type of array, and the Generics changes in JDK 1.5 further restrict that kind of action, but there are instances where it is very handy and very quick.

OK, quick complaint: PHP doesn't have a println function. You have to remember to print "\n" after every echo or print command. There, I said it. I guess I can make a
function println($var)
{
echo "$var\n";
}

but that seems like overkill (even for a Java developer).

Database Access
Unlike Java, PHP has a much more streamlined database access system.
$connection = mysql_connect("localhost:3306", "user", "pass");
$result = mysql_query("select * from schemaname.table");
while(($row = mysql_fetch_row($result)) != null)
{
var_dump($row);
}
mysql_close($connection);
Opens, queries, prints the returned rows, and closes the database connection. 7 lines (counting braces). And since $row is an array, you can access it like any other array $row[0] is the first column, etc. Which leads into the next point:

Exception Handling
PHP does not require any exceptions to be caught, even though they can be thrown. It will handle migrating exceptions up the stack for you until it finds a handler or the top of the stack. So you can effectively ignore errors without having ... throws OneException, TwoException on every method like Java.

Arrays
Maybe I've written one too many classes using StringTokenizer, but being able to turn
$var = "0 1 2 2";

into an array of string using
$vararray = explode(" ", $var);

is really handy.
As is being able to interchange arrays, lists, and tables on the fly. So you can do things like this:

$vararray[0] = "zero";
$vararray["one"] = 1;
$vararray[] = "two"; // add to end of list

more to come!

Thursday, November 13, 2008

Reformed Adblocker Speaks Out

Like most internet users, I am annoyed by ads. So I use Adblock Plus in Firefox to eliminate ads. Seems like 28,742,796 people agree with me, since that's the number of downloads Adblock Plus has had, according to Mozilla.com (as I write this). That's a lot of people saying one thing: "Stop bugging me with ads." But is it throwing out the baby with the bathwater?

Many (most?) web site survive on a mixture of ad revenue and sales. One of which is minute (ad revenue hangs around $0.005 per view, so 1,000 views will pay $5.00), the other is relatively large (buying a shirt at $30 or so). Used correctly, the two can be used to create sustainable income. Sustainable for the site bills, not necessarily for the owner's, anyway. But the problem comes from logic that "if a little is good, more will be better," in this case, a little advertising generates X dollars, then more, intrusive, annoying advertising will generate X+Y dollars (I'm not even going to bother to make up numbers).

Those of you who have been around long enough remember the progression. First, a banner ad at the top of the site. Then, an X10 pop-up (remember those?), then an inter-site ad between pages. Now, all of those above, plus the textual-context pop-up and whatever else they can think of. At some point the line was crossed and Adblock became a lifesaver. People said "Enough blinking, flashing punching of monkeys!" and blocked it all.

Then along came Google, with simple, targeted text ads. It's like the English butler of web ads. "Excuse me sir, if I may, I have some products that may be related to what you are looking at. They're over here if you'd like." Compared to the used-car salesman method used before, Google single-handedly overhauled the web advertising model. But, how many people can see them, assuming they are blocking all ads with Adblock? Some may argue that they will look for products when they look for products and content when they look for content and don't want the two to mix. That's fine, those people are not the ones I'm worried about.

I think a reasonable middle-ground can be reached. Use Adblock, but with a whitelist to approve of sites that you enjoy/aren't annoying with their ads. Start with this:

@@/pagead2.googlesyndication.com/*$script,subdocument

to unblock Google ads. Then remember to right-click on the Adblock stop sign icon and click "Disable on site" so they can get some of their revenue back.

I think a more permanent solution can be found with something like OpenID. With it, you can have an account and log into any participating site. Using it, a site (or series of sites) can track how much a user has contributed and show/hide ads appropriately. If a user makes a donation of $10, they don't see ads for a year. Like a subscription, but applied to all the sites under the umbrella. Recognizing that:
  • people want something other than being bombarded by ads
  • web sites won't survive without revenue
will create a new model that benefits everyone.

Monday, September 29, 2008

Where Does The Time Go?


When you start a project, don't you marvel at how efficient you are? How much you can get done in so little time? Why can't we always be that efficient? There are a number of reasons why efficiency drops off. Producing documentation (oh, you want someone else to help you, or you want someone else to use your project?), meetings (you need to work with these other people?), and bug fixes are the primary detractors from producing new functionality.

I've said before that Line of Code count is a horrible metric for measuring software development. It's like measuring hammer strokes in building a house, or the number of licks to get to the center of a Tootsie Pop. So I will be using percentage of time as a virtual metric. What percentage of time is spent in which task is a much more useful estimation than counting the number of nails used in a wall.

When you start a project, obviously most of your time is spent building new features. There may be some note-taking along the way to help you remember what you're doing, but really you're just trying to produce something that works, to solve your original problem. As you move along, there will be a equilibrium between new code, bug fixes, documentation, and meetings. You will wind up with a graph that looks like this:














Good amount of new growth, reasonable amount of bugs being fixed, not too much other stuff. Enough documentation to be useful, but not so much as to be a mountain. A few meetings to discuss what's going on and how it's going.

Compare to these two horror cases:













or













Given that there are only 24 hours in a day (please don't let my boss find that out...), for every additional chunk of time taken doing something else, that's time that can't be spent building software. So how do we maximize the time spent building software, and find the sustainable balance between too much documentation and too little? That's a topic for another day.

Thursday, September 04, 2008

Making a List, Checking it Twice

I woke up this morning and thought to myself "How does Santa Claus keep himself motivated? He has one big deadline once a year. For 364 days of the year, he really doesn't have much to do."

OK, seriously, like most people, I've been having a hard time getting out of bed, especially now that it's dark out when the alarm goes off. And the first thing that goes through my head is "Ugh, another day, just like the last one." But today I made a quick list of things I needed/wanted to get done for the day, and I actually felt motivated to get out of bed and go to work. Of course, I'm now working on my blog, but I've still got an agenda for the day.

The list for the day is pretty simple (hey, I though it up at 5:45 this morning, cut me some slack):
  • Configure CruiseControl to do something useful
  • Find my best three job postings and send resumes to them
  • Write up a couple business ideas (things to do if I get tired of programming)
It definitely makes it easy to get out of bed when you've got something to do, something new to learn, something new to try. Hope it helps you as well.

Tuesday, September 02, 2008

Is Your Company In Trouble?

Obviously, I'm going to look at this from a software development perspective, but it will include some general business observations (10 years of work experience gives you some business sense). Without espousing a particular methodology (although I'm sure I've made it clear what I support), ask yourself the following questions:

  • Does this Company offer a challenge that's innovative and interesting?
  • Do they offer easy access to innovative and interesting challenges (how easy is it to work on something that interests me)?
  • How would a new idea be received? With questions like "How do we budget for it?" or "How can we use that to save money?"
  • Is the Company worried about meeting estimates, rather than providing functionality?
  • What's the Meeting/Time Quotient? Do you have to constantly attend meetings and provide status, or can you get your work done with few interruptions?
  • How is the office environment? Are you provided with just what you need to get the job done, or do you have an office that you'd like to show people?
  • Is their business model currently or potentially threatened by other Companies? Are they doing anything to become/remain the industry leader? How open are they to new ideas to help the company?
  • What is the company attitude about new software? Can you try it out and see if it helps your situation, or do you need approval to try it?
  • How long does it take to "do something"? How many steps in how many systems involving how many people does it take to report a bug/checkout/fix/build/deploy/test/check in/close the bug? If any of those numbers is large, what is preventing the system from being optimized?

Monday, August 25, 2008

Are You Wasting Time?

No, I don't mean reading my blog or surfing the internet or watching youtube videos. I mean even when you are working, are you spending time doing repeatable tasks that could be automated, and are you doing tasks that do not contribute to the development of the product?

The first question is the easy one to ask, but can be a hard one to fix. For example, does your build require you to "do something" to go from the built baseline to a running executable to test or deploy? If not, what's holding it up? Is the build simply not setup to do that, or is some portion of your system not configured to do it? If it's the second case, that's a bigger problem, but it can still be overcome by working with your system administrator and addressing the holdup.

The second question is the bigger task, and will usually require buyoff from your program Higher-Ups. I see this problem as the corollary to the Agile Manifesto, which states (among other things) "Working software over comprehensive documentation." I like to sum that statement up simply "Does this help me do my job? Would this help someone else do my job?" If the answer is "No" don't do it. You can quickly see where that would be a problem with the Higher-Ups, and the "That's the Way We Do It" crowd.

Another corollary to the Agile Manifesto is "It's OK to Screw Up", meaning, do something, give it to the users, and they'll tell you what they like and don't like about it. But do it fast enough to have enough time to fix the problems, and add the new features to it. This can't be done in a rigid environment, where it takes weeks just to get a fix to the users, never mind the amount of time to develop a new feature. To do it, and do it fast, and get it right, actually takes less overhead than you'd think.

Friday, August 22, 2008

Building Blocks or Jigsaw Puzzle?

Is your system building blocks for a larger system, or jigsaw puzzle pieces to build the same system? A puzzle piece fits into one and only one spot. A building block will fit into one spot, but can fit into other spots as well. I also like to think of it as a "spiky vs. smooth" interface.

What make good building blocks? The obvious buzz-words "extensible, re-usable, decoupled, cohesive modules" apply, but it's easier too look at. How hard is it to add new data to the system, and how hard is it to make the system do something else? If the answer to any of those is "not easy", then you're building jigsaw puzzle pieces. Look at your method signatures. Do they perform specific functions on specific types of data? Those are puzzle pieces. Do they perform specific functions on generic data, or (better still) generic functions on generic data? Those are building blocks.

So what makes reusable, extensible data objects? Obvious answers are object inheritance and using a syntax like XML. But no matter how you do it, you need to identify similarities between data and only handle unique data in special cases. Abstraction of data into generic objects for modeling and design helps describe the mappings between similar data.

An example of what I'm talking about. How many times have you seen "The Bank Example?" But imagine that example built to handle one very specific type of currency and a couple specific use cases:
depositDollar
and
withdrawDollar.

Simple, straightforward, and almost totally non-reusable (without refactoring). The use cases would be diligently mapped to methods named
depositDollar(Dollar dollar)
and
Dollar withdrawDollar()

in class AccountAccess. Dollar, being a good data class, would contain all the useful metadata about your dollar: serial number, creation year, number of wrinkles, that sort of thing. On the server, the business logic would need to store the Dollar and withdraw the Dollar from the Account, and do all the associated checking that goes along with it. So now, how do we add a new data type or another operation? Yup, create a new use case and a new method. Repeat ad infinitum (or at least ad naseum).

Now we've got a system doing very similar things on very similar data, but it's hard to refactor because you're used to thinking about the specific types of data, that a Dollar is somehow different than a Quarter, or a Euro. The only thing that can pull us out of this mess is to re-evaluate the data, and more importantly, what the users want to do. They want to put money into their account and withdraw money from their account. Now, generically, we can use methods
deposit(Money amount)
and
Money withdraw(Money amount)

where Money is, of course, an interface that Dollar, Pound, Euro, etc. all implement.

Now, I realize that there will be have to be some "business logic" in the system, otherwise the system doesn't do anything unique. But the amount of uniqueness should be low, and decrease, not increase, as the system expands.

Wednesday, August 13, 2008

Branches, Refactoring, and Deprecation

One of the biggest hurdles in refactoring is answering the question "How do I merge changes into something that's no longer there?" I will attempt to lay out a useful strategy for addressing that problem.

In our example, we've got a class that has a number of methods. The number of methods continues to grow (and the amount of logic in those methods grows as well), to the point that the class is unmanageable. It's no longer clearly defining its task (if it ever did), and has wandered off into areas that are outside its scope. But, because of its importance to many areas of the system, and because of its poorly defined task, it has a lot of bugs in it. The bugs may be from improper implementation, or incomplete analysis. Either way, there's a lot going on in here.

The obvious answer is "Refactor it." However, that's not going to be easy, since there are still bugs that need to be fixed during and after refactoring. How do we deal with this? With a cunning use of deprecation and branching.

Step one is to create a branch for the refactoring. The syntax and details of doing this are different for every version control system out there, so I won't get into it. Now you've got an area set aside just for refactoring without any outside influence and are able to verify that you haven't broken anything, through automated or manual regression testing.

Step two is to deprecate the methods or classes that you've modified during refactoring. But don't remove them! That will keep a merge point for you to pick up an additional changes that happend during refactoring and testing, and inform other developers that the class is about to change and the current method is no longer supported. In addition, a beneficial practice when deprecating (I'll use Java as an example, since I'm most familiar with it) is to include the @see tag to point the other developers at the right method.

Step three is merge in any changes into your branch. This will be a two step process: merge the changes into the original spot, then apply them into the refactored locations. This may seem time consuming, but is really the only way to ensure that your branch is up to date before merging into the main line. Re-run any tests for the changes to make sure that you merged the changes correctly (at this point, the benefit of automated testing over manual testing should be more obvious). Now you can merge your branch into its parent.

Step four is only necessary if other people are working on their own branches as well. After such time has passed that all the branches have picked up the refactored class, you may delete the deprecated methods or classes. This additional wait allows the other branches to migrate to the new organization at their own pace. They will have to do the same step you did before merging: apply any changes they've made to the original method to the refactored one.

Tuesday, August 12, 2008

Development by Telephone Game

Does your development process have a lot in common with the Telephone Game (also called Chinese Whispers or Russian Scandal)? I mean, how many steps, and how many people, are involved in getting what the users want into their hands?

Do the users tell their management, who tell the prime contractor's systems engineers, who tell your system engineers, who tell you what to build? Then, do you give your build off to the integration team, who gives it to the maintenance team to give to the users? Let's see, that's seven steps between the users and the end product.

That's one key area where open-source does it right, because they have to. The user submits a bug or new feature to the developers. The developers fix it or build it and give it back. That's one step (OK, there's probably a couple internal steps for reviews, and testing, but I overlooked those in the first example, so I'm overlooking them in this example too). And funny enough, it creates a stable product that delivers the functionality the user wanted faster than "Doing it the right way."

Are you naive enough to think "But think of how good the product would be if they did it our way?" The reason open source works (and work fast) is that there aren't multiple steps between the user and his product. The best tools are used BY the developers (see GCC for example) during development. If there's uncertainty, the developer talks to the user. There's no "pushing issues up the chain" and waiting for resolution. There's just a need, and the fulfillment of that need.

So call it Agile, call it XP, call it whatever you want. But if you base your development around creating user functionality instead of data functionality or system functionality, you will produce more user-usable functionality faster, simply because everything you build addresses some user need.

Wednesday, July 30, 2008

The Cost of Big Servers

I decided to compare the cost/benefit of a blade arrangement versus a "big server" setup. For the comparison I've configured a Dell PowerEdge M600 as my blade and a Sun Fire X4600 M2 as my "big server" setup.
Price:
Dell = $2,605
Sun =
$15,995.00

CPU:
Dell = 2xQuad Core Intel® Xeon® E5405, 2x6MB Cache, 2.0GHz, 1333MHz FSB
Sun =
4 Dual-Core AMD Opteron - Model 8222, 3.0 GHz

RAM:
Dell = 8GB 667MHz (8x1GB), Dual Ranked DIMMs
Sun = 16 GB (8 x 2 GB DIMMs)

Disk:
Dell = 2x73GB 10K RPM Serial-Attach SCSI 3Gbps 2.5-in HotPlug Hard Drive
Sun = 292 GB (2 x 146 GB) 10000 rpm SAS Disks

OS:
Dell = RHEL 4.5ES (although I would put some other Linux variant on if it were up to me).
Sun = Solaris 10 (shocker, I know)

Roughly, the Dell is 6x cheaper than the Sun, which means I can buy 6 of them for the price of the Sun (duh). Now, partitioning that system could create two database servers (those would need more disk, obviously) and four application/web servers. So for the price of one "big server," I've already got a cluster of four app servers and a cluster of two database servers. Arrange as needed for your system.

Throwing disk arrays into the mix continues this thought. Comparing a Dell PowerVault NF600 to a Sun StorageTek 5320 NAS Appliance again shows the case against "one big server", even though we're comparing Network disk servers

Price:
Dell = $4,926
Sun = $ 48,164.00

Disk capacity:
Dell = 4x400GB 10K RPM Serial-Attach SCSI
Sun = 16 x 500 GB 7200 rpm SATA Disk Drives

CPU:
Dell = Dual Core Intel® Xeon® 5110, 1.60GHz, 4MB Cache
Sun = 1 x 2.6 GHz (assuming AMD)

Now we're in the ballpark of 10x price difference. So I could have 10 Dell NAS drives serving my 6 servers (again, I know there's no reason to have 10 drives for 6 hypothetical servers, but just to prove a point), vs one big drive serving one big server.

The price of adding "commodity-level" hardware is so (relatively) low, that there's almost no reason not to do it. Planning to build on this type of architecture quickly leads to partitioned applications, so that a new application can get its own host, or any service that needs more power gets more power, rather than improving the performance of the service. Figure $100/hour per developer. If it takes more than half a man/week to profile, re-code, retest and redeploy the service, you'd be better off throwing hardware at the problem. And as the net load on the system grows, it's much easier to build out horizontally when the increment size is small.

Now I know that you can run multiple virtual servers on one "big server", but the point I'm trying to get across is that the bang-for-the-buck of the "little" servers is much much greater, and acutally building around small servers creates a more modular, scalable software architecture.

Wednesday, July 23, 2008

Agile Architecture

Agile (or eXtreme Programming) development usually focuses on the development of the software itself, but it can (and should) be applied to all parts of the system. If the architecture of the system is determined before the first iteration and not open for refactoring or other redesign principles, then it falls into the category of BigDesignUpFront. To put the problem another way, how can the domain of the system be determined if what the system will do is not determined? And, if the system functionality is already determined, then there's no room to DoTheSimplestThingThatCouldPossiblyWork.

One argument against DoTheSimplestThingThatCouldPossiblyWork is that nothing complex (like a compiler) would be built. I disagree, simply because DTSTTCPW should only be applied to the domain as it is understood. So TheSimplestThing when you are getting started should be different than TheSimplestThing as your system becomes operational. TheSimplestThing when you are working in isolation on your personal computer becomes massively different from TheSimplestThing when you are providing a service on the Internet. As a ridiculous example, think about the problem of searching for a given word in your filesystem. Would you build Google to do it? No (or I sincerely hope not). You'd use find or perl or some other script language. Now, if you want to let anyone search for any word in any word on the Internet, now you've got to build Google, and all the system availability and performance benchmarks that go along with it.

The great thing about Web technologies is that they scale well, and don't require replacing to service greater demand or performance objectives. If one webserver can handle your own requests, then a cluster can handle thousands of them. But how do you refactor an architecture? Most people think of architecture as an inflexible part of the design. Nothing could be further from the truth. Writing POJOs (Plain Old Java Objects) leads quickly into EJBs (Enterprise Java Beans) in a J2EE container, or web services. Same goes for any web-based scripting language. Command line PHP can be easily hosted by Apache and serve the same data over the web.

The key point is that just as your software should be designed iteratively, so should the archtiecture. As the system requirements grow and change, the architecture should grow and change with it. Will there be additional software changes to handle the new architecture? Probably. Will the overall system be better suited to handle the system parameters? Yes. Otherwise, you will be trying to put a now square peg into a hole that's still round. It can be done, but is sure isn't TheSimplestThing.

Monday, July 21, 2008

The Network Is The Problem

The first goal of any architecture should address two questions: how does the user interact with the system, and what does the system do when the user can't connect. Obviously, the goal of the system is to provide as much connectivity as possible, but there will always be potential problems between the client and the first network connection. The system will need to address how to handle data that does not make it to the server (like, blogger, for example).

How the user interacts with the system will affect how the system handles the data when the network is down. Of primary importance is what to do with any data that would have been transmitted from client or server, and how to recover once connectivity is restored. For a time-critical system, an additional important decision must be addressed: what to do if the request isn't received in time. If the request can't be fulfilled, does it still have value?

By focusing on the user input, you can keep your project in scope. If you start looking at data, then you can start looking at all the places that data can go, and all the systems that need to interact to deal with the data. By focusing on what the user needs, rather than what the system requires, you can iteratively add more and more features, and rebuild the system as new capabilities are required for those features.

A trivial example: a user needs to know what his work assignments are. So he 1) logs into the system (define the scope of the user environment, address network connectivity - see, I told you it would be important), 2) requests his work (pre-defined data, manual query?) 3) displays the data (what kind of display is available?)

Now, once those decisions are addressed, they don't need to be addressed for the user to create a new work assignment. Rather than attempting to figure out all of the system up front, the system can provide a needed functionality quickly. Once a decision is made, it can be applied to all other functionality in the system.

Monday, July 14, 2008

Agility 101: Refactor Your Build

One of the problems with using an adjective like "agile" or "extreme" (ugh) is that it implies a degree. As in, "A mouse is more agile than an elephant." However, you could say the elephant is more agile than, say, an aircraft carrier. So, off the bat, agile seems open for interpretation, whereas Rational Unified Process or Waterfall have well defined steps that you either are doing or aren't doing. This is wrong. Agile has a very well defined set of processes, but since the term seems to lend itself to statements like "We need to add some agility," it is working at a disadvantage.

The primary goal of agile development is continuous, sustainable development. If this is not possible in your current environment, ask yourself why not? A primary area for improvement is the build/deploy/test environment cycle. If it takes hours to build, deploy, and coordinate testing, then you've reduced the amount of time available to do anything useful. Some important points to consider:

  • How many steps does it take to go from build to running system? If it's very many, re-examine all the functionality that Ant provides.
  • Do multiple versions of the system potentially interfere with each other? If so, figure out a way to run multiple versions in isolation. As an additional point, can multiple versions of the same software co-exist on the same server or in the same environment? If I need to care what you are doing, that's going to take time to rectify.
  • Can I build and test multiple branches without interfering with a previous build? To reiterate the previous point, if it takes time to reconfigure and set up to test a fix or enhancement, that's time that can't be used for coding.


    Given the capabilities of a tool like Ant, there's no reason that a complete, executing system cannot be produced out of every build.
  • Tuesday, June 24, 2008

    Improve Product, Not Process

    Believing that process improvement will improve the product is like believing spending more time looking at the map before taking a trip will prevent you from running into a traffic jam along the way. Process improvement creates paint-by-numbers, it creates fast food. Which is fine if that's your goal, but don't expect anything other than paint-by-numbers or fast food. Otherwise, we're treading into Einstein's definition of insanity "Doing the same thing over and over and expecting different results."

    If you feel you must improve your process, plan your process for change. There are too many uncontrollable variables to have any expectation of a successful delivery with a single waterfall method. So plan your process for more frequent changes, more frequent reviews, and more frequent releases. Hmm, this sounds like agile (without a capital A).