Monday, August 10, 2009

Rapid Java Webservice Prototyping with Hyperjaxb (Part 2)

We resume the HyperJAXB example with some business logic.

Step 4: Let’s start with the basics: save, delete and query. We’ll leverage Hiberate’s ability to perform a saveOrUpdate, so we don’t need separate methods for both. Create a class, PurchaseOrderPersistence, and initialize Hibernate. I chose to use a singleton, just to make sure that the access will be atomic. So I’ve got two methods:

public static PurchaseOrderPersistence getInstance() {
if (instance == null) {
instance = new PurchaseOrderPersistence();
}
return instance;
}

private PurchaseOrderPersistence() {
persistenceProperties = new Properties();
InputStream is = null;
try {
System.out.println("loading properties");
is = PurchaseOrderPersistence.class.getClassLoader()
.getResourceAsStream("persistence.properties");
System.out.println(PurchaseOrderPersistence.class.getClassLoader().getResource("persistence.properties"));
System.out.println("is: " + is.toString());
persistenceProperties.load(is);
System.out.println("props: " + persistenceProperties.toString());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (is != null) {
try {
is.close();
} catch (IOException ignored) {
}
}
}
entityManagerFactory = Persistence.createEntityManagerFactory(
" PurchaseOrder ", persistenceProperties);
}

What it’s doing is looking for the persistence.properties file and passing it to the EntityManagerFactory along with the persistence unit name (PurchaseOrder) defined in the build. Remember that we created persistence.properties in step 3 to define our Hibernate connection properties. In the example, we used JNDI to bind to the datastore, but you could use a JDBC URL as well. Now that the system knows what to do with the beans, we can start creating methods. Basic getOrder method:

public PurchaseOrderType getOrder(long id) {
PurchaseOrderType order = null;
EntityManager em = null;
try {
em = entityManagerFactory.createEntityManager();
order = em.find(PurchaseOrderType.class, id);
} catch (Throwable t) {
t.printStackTrace();
} finally {
if (em != null && em.isOpen()) {
try {
//em.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
}
return order;
}

Using EntityManager, we are able to lookup the requested object by ID without needing any SQL. Repeat as necessary for update and delete methods.

Step 5: Create the web service interface. Create a class to contain the web methods and use the WS annotations to declare the class as a web service. It should look like this:

@WebService(serviceName = "PurchaseOrderWS")
public class PurchaseOrderWS {

private static ObjectFactory of = new ObjectFactory();

@WebMethod
public PurchaseOrderType getOrder(long orderID) {
PurchaseOrderType task = null;
try {
PurchaseOrderPersistence pop = PurchaseOrderPersistence
.getInstance();
task = pop.getOrder(orderID);
if (task != null) {
System.out.println("getTask: " + task.toString());
}
} catch (Throwable t) {
t.printStackTrace();
}
return task;
}


This creates a web service named PurchaseOrderWS, and web method getOrder that returns a PurchaseOrderType. Keep in mind this is all without doing anything to the data object beyond defining them in the XSD. It's not 100% necessary to have the web methods in a separate class from the persistence methods, since there's a 1 to 1 mapping between them, but it's good practice to allow some flexibility in the design.

Step 6: Package and deploy the web service. Create a web.xml deployment descriptor (is this necessary with annotations?)

<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4" xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<display-name>
PurchaseOrderWS</display-name>
<servlet>
<servlet-name>PurchaseOrderWS</servlet-name>
<servlet-class>com.jon.purchaseorder.PurchaseOrderWS</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>PurchaseOrderWS</servlet-name>
<url-pattern>/PurchaseOrderWS</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
<welcome-file>index.htm</welcome-file>
<welcome-file>index.jsp</welcome-file>
<welcome-file>default.html</welcome-file>
<welcome-file>default.htm</welcome-file>
<welcome-file>default.jsp</welcome-file>
</welcome-file-list>
</web-app>

This defines the mapping of the web service to the implementation class. Package a war file using ant like this:

<!-- copy ws-related stuff -->
<copy todir="${basedir}/target/classes">
<fileset dir="${basedir}/lib">
<include name="runtime-0.4.1.5.jar" />
<include name="commons-lang-2.1.jar" />
<include name="hyperjaxb*.jar" />
</fileset>
</copy>
<!-- create war file-->
<jar destfile="${basedir}/target/classes/generated-classes.jar">
<fileset dir="${basedir}/target/classes">
<include name="**/*.class" />
</fileset>
<fileset dir="${basedir}/resources">
<include name="*" />
</fileset>
</jar>
<war destfile="${basedir}/target/PurchaseOrderWS.war"
webxml="${basedir}/resources/web.xml">
<lib dir="${basedir}/target/classes">
<include name="*.jar" />
</lib>
<metainf dir="${basedir}/target/generated-sources/xjc/META-INF">
<include name="*" />
</metainf>
</war>

This will package the war file using the web.xml from above, including the META-INF directory generated by Hibernate containing the persistence.xml with the object-relational mapping file. It also includes a jar file of the generated classes, since they are compiled into a separate directory from the implementation classes. It also includes the necessary hibernate/hyperjax jars in WEB-INF/lib. Copy the resulting .war file into your JBoss/server/deploy directory and start it up. If all has gone well, you should be able to navigate to the WSDL for your web service. If you invoked the service, you would persist your data in the remote database without writing any SQL.

Wednesday, August 05, 2009

Rapid Java Webservice Prototyping with Hyperjaxb (Part 1)

Something different: a hands-on example of integrating a number of FOSS/COTS tools into one useful suite for rapid prototyping.

Prerequisites:
JBoss 4.2.3
Hyperjaxb 0.5.3
Oracle 9i (although the database doesn’t really matter)
JDK 1.6.

Rapid prototyping with Hyperjaxb
This assumes JBoss 4.2.3, Hyperjaxb 0.5.3, Oracle 9i (although the database doesn’t really matter) and JDK 1.6.
Foreword: Hyperjaxb provides an object-relational mapping for JAXB objects. That means that a JAXB-compliant bean can be mapped to database persistence. Leveraging JAX-B and Hibernate (hence the name), it can take a schema and generate JAXB and Hibernate annotations in the generated beans. This feature allows for very rapid Web Service development, since the beans that your service uses are the exact same objects as you are saving into the database, and are generated with little more effort than writing the schema.

Step 1. Create a schema. For our example, we will use the well-known PurchaseOrder xsd from
http://www.w3.org/TR/xmlschema-0/#POSchema. Since it’s short, we’ll just include it here:


<xsd:schema xsd="http://www.w3.org/2001/XMLSchema">

<xsd:annotation>
<xsd:documentation lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>

<xsd:element name="purchaseOrder" type="PurchaseOrderType">

<xsd:element name="comment" type="xsd:string">

<xsd:complextype name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress">
<xsd:element name="billTo" type="USAddress">
<xsd:element ref="comment" minoccurs="0">
<xsd:element name="items" type="Items">
</xsd:element>
<xsd:attribute name="orderDate" type="xsd:date">
</xsd:attribute>

<xsd:complextype name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string">
<xsd:element name="street" type="xsd:string">
<xsd:element name="city" type="xsd:string">
<xsd:element name="state" type="xsd:string">
<xsd:element name="zip" type="xsd:decimal">
</xsd:element>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US">
</xsd:attribute>

<xsd:complextype name="Items">
<xsd:sequence>
<xsd:element name="item" minoccurs="0" maxoccurs="unbounded">
<xsd:complextype>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string">
<xsd:element name="quantity">
<xsd:simpletype>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxexclusive value="100">
</xsd:maxexclusive>
</xsd:restriction>
</xsd:simpletype>
<xsd:element name="USPrice" type="xsd:decimal">
<xsd:element ref="comment" minoccurs="0">
<xsd:element name="shipDate" type="xsd:date" minoccurs="0">
</xsd:element>
<xsd:attribute name="partNum" type="SKU" use="required">
</xsd:attribute>
</xsd:element>
</xsd:element>
</xsd:element>

<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpletype name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}">
</xsd:pattern>
</xsd:restriction>
</xsd:schema>


This schema defines a purchaseOrder object as contaiing shipping information (shipTo), billing information (billTo), and the ordered item information (items).

Step 2. Compile the schema. The relevant portion of the build.xml looks like this:


<xjc destdir="${basedir}/target/generated-sources/xjc" extension="true">
<arg line=" -Xhyperjaxb3-ejb
-Xhyperjaxb3-ejb-persistenceUnitName=PurchaseOrder -Xhyperjaxb3-ejb-roundtripTestClassName=RoundtripTest
-Xequals
-XhashCode
-XtoString">
<binding dir="schemas">
<include name="jaxbBinding.xml">
</include>
<schema dir="schemas">
<include name="po.xsd">
</include>
<classpath>
<fileset dir="${basedir}/lib">
<include name="*.jar">
</include>
</fileset>
</classpath>
</schema>
</binding>
</arg>
</xjc>


Which tells xjc (jaxb compiler) to allow extension, use hyperjaxb-ejb as an extension, and compile po.xsd. Binding allows schema overrides to be custom-defined. This creates 4 classes: Items.java, ObjectFactory.java, PurchaseOrderType.java and USAddress.java Let’s look at PurchaseOrderType.java. You’ll notice that the JAXB annotations are present on the class variables. In addition, the JPA/Hibernate annotations are present on the accessor methods. Also, note that a field not defined in the schema, hjid, has been added. This will serve as the primary key for the table since one was not defined in the schema. Later, we’ll show you how to handle this.
For the field shipTo, the declaration looks like this:

@XmlElement(required = true)
protected generated.USAddress shipTo;

and the accessor methods look like this:
@ManyToOne(targetEntity = generated.USAddress.class, cascade = {
CascadeType.ALL
})
@JoinColumn(name = "SHIPTO_PURCHASEORDERTYPE_ID")
public generated.USAddress getShipTo() {
return shipTo;
}

And

public void setShipTo(generated.USAddress value) {
this.shipTo = value;
}

Note that shipTo declares a ManyToOne relation, which indicates that many shipTo elements will reference this one PurchaseOrder, and will contain a foreign key to the primary key in the PurchaseOrder (the hjid).

Step 3. Hibernate configuration. Obviously, this will vary depending on your environment and your intended target. In this case, we will be creating a web service, served by JBoss, and connected to an Oracle database. In any case, you will need a persistence.properties file to define connection properties. For example:


hibernate.dialect=org.hibernate.dialect.Oracle9iDialect
hibernate.connection.driver_class=oracle.jdbc.driver.OracleDriver
hibernate.connection.username= user
hibernate.connection.password=password
hibernate.hbm2ddl.auto=create-drop
hibernate.cache.provider_class=org.hibernate.cache.HashtableCacheProvider
hibernate.jdbc.batch_size=0
hibernate.connection.datasource=/OracleDS


This tells hibernate that it will connect to an Oracle9i database using the Oracle driver, connected via JNDI to /OracleDS. Configuration of OracleDS can be found in the JBoss documentation. For our example, we’ll assume it is set up and working. Note that Hibernate allows us to replace the underlying datastore without requiring any change to the business logic. That means the dialect parameter would be the only change required to migrate to another DBMS.

That's enough for today. The next entry will focus on writing the business logic to use the generated beans in a web service and deploying the web service into the application server.

Monday, March 23, 2009

Evolutionary Architecture

Agile Architecture, Round 2. As I (and others) have said, "Do The Simplest Thing That Could Possibly Work" applies to more than your software. So today, we'll look at the common growth process for a system, and some of the constraints that limit each phase. Our example is a web-based, database-driven application, but can be applied to any system with persistent storage. (Thanks to mxGraph for images).



Step 1 is the absolute basic setup: everything running on one server. One webserver, one database backend. I suppose Step Zero would be no database, just files or something. But even that trivial example is enough to get us off the ground. No failover, no backup, no load handling? No problem! Now, I would not advise using this type of system in a live environment, but for prototyping and identifying performance bottlenecks, it's just fine. Get something up and running, then see how it behaves. The ultimate limit of the system with this configuration is limited by how big of a box you can get your hands on. But most likely, one giant enterprise server would not work as well as step 2, for reasons that will be explained.

This also would be a good place to mention the value of monitoring and logging at this level. In order to identify bottlenecks, you need to be able to see how your system responds to load.



Step 2 is a very common setup. It's not that much different from step 1, except that the database and webserver are on physically separate machines. This is the first step toward separating into functionality, except the functionality is as coarse-grained as it can be at this point. You could just as easily add a clone of the first machine, but now you've got two sets of processes to manage, which we will address in the next iteration. This setup allows us to tune each machine for its specified task: application or database access.

By introducing a second machine, we've now introduced a synchronization issue between them. If one machine goes down, the second machine needs to be able to address that somehow. Fail the request? Queue and retry? The answer is "It depends" on the situation being addressed. Serving webpages, the user may wish to submit their query again. Storing credit card requests may need to be more robust. But we can handle the problem with the next step:



Step 3 could theoretically be scaled out to hundreds or thousands of servers, and Step 4 is really a subdivision of Step 3. Notice that our number of servers is increasing exponentially. We started at 1, then added a second. Now we're managing (at least) 4. The cluster implementation can be as simple or as complicated as you make it - J2ee container-managed cluster or simple proxy software. The important things with this step are:
  • you can now scale as many servers as you may need, quickly and simply.
  • you're removed almost all failures from impacting availability.
  • you're still constrained by the database master server.
This step is assuming that you're still dealing with fairly homologous data at this point: all the data coming and going from the servers can be dealt with similarly. This type of layout is pretty common among the Less Than Huge websites, that is, excepting the Ebay's, Facebook's, and Google's of the world. Once you get that huge, you will probably need a unique approach to handle your traffic anyway. But this will get you to 99% of the way there. At this point, you've already addressed pushing updates to your servers and handling inconsistency between them (in the previous step). But let's say your monitoring is indicating that you're either writing too much data for one write-only database to keep up with. Or let's say you identified a major performance bottleneck in the system. That's where step 4 comes into play:



We're now, finally, at a point that could be considered "architecture." Each type of process is located on its own server, connecting to a database that contains the data that it needs to perform that function. This would be the point where de-normalization may come into play. By removing the need for connecting the databases, the minor performance/space hit taken by denormalization would be offset by the lack of interconnection between the databases. Also at this point, there may be additional vertical stacking, separating the presentation layer from specific data-processing. Now we're into the classic 3 (or n)-tier model, but the internals of those layers can scale as large as we want (within some physical limits).

So to sum up, your architecture should "grow" along with your application. It should provide a framework to allow your application to handle growth, not restrict you to grow in a specific path.

Tuesday, March 10, 2009

Schedule, LOC, and Budget? You're Lying

Software engineering is an imprecise science. Anyone who tells you differently is either not working very hard, or doesn't understand that fact (or both). Actually, calling it engineering is a bit of a misnomer itself, since the engineering work in building systems is as different as engineering a submarine vs. a bridge. Engineering involves creating repeatable systems of common features. That is, there are a set of rules for building sets of bridges that have little in common with the rules for building submarines.

The problem with defining software as engineering lies in the identification of the dividing line between engineering and manufacturing. Determining the amount of effort (time, code, and people) for software is much less defined than traditional "peoplespace" engineering because of the fluidity of the tools that are used. Imagine writing a proposal to build the first transcontinental railroad. 50 miles per year, thousands of workers. Now image the proposal today: much more production, much less workers. Computers allow the creation of these monstrously efficient production mechanisms. Hence the statement "hardware is cheap, people are expensive."

Looking at schedules, we see there are two types of release schedules: time-constrained (release on a certain date, like Ubuntu), or effort-constrained (all these features are in release X, like, say, Apache Httpd). Time-constrained releases on a set date, with a variable number of features. Effort-constrained delivers when the set features are done. Neither schedule mechanism has any concern with people or time. So either you release on a scheduled date, with whatever is done at that time, or you release when everything is done, regardless of date.

It would be silly to create a release schedule that released every 10,000 lines of code, but that's what our snakeoil salesman are proposing. Here's how it works: a budget is calculated based on the estimated lines of code, times the number of man/hours that would take to create, based on historical estimates of productivity. So, like Earl Sheib, you're saying "I can build that system with 100,000 lines of code!" Calculating budget is now a function of hourly billing rate times productivity and estimated lines of code.

Here's an example: customer says "build me a new SuperWidget." Contractor looks thoughtful and says "We think SuperWidget should take 50,000 lines of code, since it's kind of like TurboSquirrel we built before. Our normal efficiency is 1 LOC per hour (seriously), and billing rate is $200 per hour. So budget is $10 million dollars. There's 2,000 man/hours per year, so we need 25 people and you can have it in one year." There's your budget, schedule, and estimate, all rolled into one. If one changes, the others have to change as well. SuperWidget is a large project, obviously.

That's seriously how it works. Oh sure, there's supposed to be more analysis to determine how similar the project will be to previous projects, but the number is still, at best, a guess. It's not like building a bridge: you can't estimate (very well) the amount of steel and concrete, simply because software is not bound to physical properties.

So how do you get this model to work? The "fudge factor" is in the productivity. You set your productivity so low that you know, absolutely, you won't miss the estimate. Why do you only produce 1 LOC per hour? StarOffice, considered one of the largest open source projects, is estimated around 9 million LOC. That's 4,500 man/years, using our previous calculation of 1 LOC per day. 4.5 years with 1,000 developers, or 45 years with 100 developers. Obviously, something else is going on. Estimates show that productivity can be around 10 times our estimate. But how do you get there? That's the topic for next time.

Friday, March 06, 2009

Score Your Work!

Today's topic will be a quiz: how much does your employer suck? Total up the points at the end (javascript if I feel nice).

1. Cafeteria - Does you company:
offer tasty, fresh food upon request for reasonable price, including free? (1 point)
offer mediocre food at their convenience? (0 points)
employ LunchLady Doris? (-1 point)

2. Security - Does your company:
treat you like an adult, and keep physical security to a minimum (badge readers, etc)? (1 point)
Have some sort of obviously CYA policy in place involving security guards? (0 points)
check your badge at the gate, again at the building, then again at the entrance to your office? (-1 point)

3. Work Environment - Does your company:
Have modern, clean, up-to-date, large, well-lit work areas? (1 point)
Have old, dirty, out-of-date, small, dark work areas? (0 points)
Have all of the previous and have on-going construction so it sounds like you're working on a major street at rush hour? (-1 point)

4. Equipment - Does your company:
provide you with what you need to do your job, and provide an easy mechanism to acquire it? (1 point)
begrudgingly provide the minimum of what you need, but make the process so tedious that it's usually not worth the effort? (0 points)
prefer to maintain a death-grip over the infrastructure, in order to maintain their need to exist?(-1 point)

5. Management - Does your company:
have a reasonable number of managers, who are willing and capable of working with you to effect change in the way the company does business? (1 point)
have too many managers, some of whom have no noticeable effect on productivity? (0 points)
have so many managers that they create a totally separate reporting chain to separate work and labor-related issues? (-1 point)

More to come!

Thursday, February 19, 2009

The Blob Problem

No, that's not a typo of Blog. The Blob Problem goes hand in hand with scope creep. As a project grows, it grows from a well-defined box into a poorly defined blob. The box, and blob, correspond to the answers to the questions "What does it do?" and "How does it do it?" From the beginning, the box clearly defines the application's domain, and the volume inside the box (I'm already stretching the metaphor) is open and visible.

As the project grows, "dark areas" of the system appear, and the box becomes less transparent, more translucent. It grows out of its original area, and starts stretching to join other areas of interest. Rather blob-like. Over time, the function and scope of the original project has morphed into something largely different.

So what do you do? As always, Do The Simplest Thing That Could Possibly Work. So the answer is, "It depends." But do what you like. I'm personally fond of a wiki with some design documentation in it. The debate about whether UML fits into DTSTTCPW still rages on, but I like to use UML more like a sketch tool. I don't care if it's correct, or that all aspects of the design are analyzed. If it explains "X talks to Y," or "X does A, B, then C" better than words would, then it covers DTSTTCPW.

Your system is still a set of Legos. Once it becomes one large Lego, the internal shape needs to be described, so that someone can use it and/or work on it without a lot of extra effort. The amount of documentation that is needed is surprisingly low, and for good reason: too much documentation is worse than too little. By trying to plow through a mountainous tome of documentation, you're wasting your developer's time. But by creating, reviewing, and maintaining that stack, you're wasting everyone's time.