Saturday, February 27, 2010

Slain by XP Guardian 2010

Somehow I got infected by XP Guardian 2010.

Ugh.

I'm not sure what it was, exactly, but it slipped onto my computer when I wasn't looking and lodged itself into my Windows Security Center.

Rather disturbingly, the infection seemed to occur within hours of upgrading that machine to Firefox 3.6. Firefox has been incredibly reliable for me over the past few years, so this must be a red herring. Still, it's the only significant change I can recall making to the machine in the last period of time leading up to the infection.

I fought with it for an hour or so, then bit the bullet and used the "Access IBM" button on the Thinkpad to completely restore the computer to the factory state.

Then I spent about 4 more hours running Windows Update, and re-installing the core applications that I really care about.

It was time, anyway; I hadn't rebuilt this computer in about 4 years.

I love this Thinkpad T43p; it's been one of the best computers I've ever used.

Now I'm back, running XP Service Pack 3, with Thunderbird and Google Chrome.

Wednesday, February 24, 2010

FAST 2010

The Usenix File and Storage Technologies conference is underway this week.

The complete technical proceedings are online! Hurray for Usenix for publishing this information, so that those of us who can't attend conferences like these can still learn from them.

The general themes of the conference appear to be:

  • Performance (parallelism, flash memory, filesystem design)

  • Efficiency, energy usage, and green computing

  • Cloud computing: provisioning, management, de-duplication, etc.

  • Error handling, data reorganization (d-dup also fits in here) and recovery



Now I just have to find some time to dig into the proceedings...

A homage to constrained writing

This is the geekiest paean you will read today, this week, this month, or probably this year.

Still, I enjoyed the short essay, and even more enjoyed following the links to all the D.R.H. material. Perhaps it's time to go try Metamagical Themas, for I have such good memories of GEB:EGB.

Most interesting tidbit I discovered, having a son who presently resides in Chico, was this:

for many vacations, a handful of my pals and I would go up with Laura and my folks to our family’s ranch in Flournoy, in north California, not far from Corning, in softly rolling hills with lots of oaks. Laura had fun riding Chico, my folks had fun rounding up and branding cows and bulls, and all of us had fun chatting, hiking, skipping rocks, playing darts and “foot carroms”, arranging and lighting kindling and logs, fixing roofs, tossing hay to always-hungry cows, and so on.

I know the feeling; for many years, I would bundle up my wife and children and drag them down to Ridgecrest, to our family's not-quite-a-ranch, for their annual exposure to mountains and rocks and wind and sand and a view that went on and on for seventy miles from the front doorstep.

I haven't yet made it to Flournoy, but since I've mostly exhausted the various short day trips near Chico, it's good to now have another one to put on the list.

Monday, February 22, 2010

TCP Offload Engines

I happened across an interesting report of a problem that was ultimately determined to be due to interactions with a TCP Offload Engine.

The problem report itself was fascinating, partly because I occasionally see very odd network behaviors that I don't understand (unfortunately I haven't had as much luck making them be reproducible and resolving them), and partly because I had never heard of TCP Offload Engines before.

So I did a bit of searching and found some quite interesting information. There are definitely a variety of sources, including some very respectable ones, which describe the potential of TCP Offload in glowing terms.

However, there are also some very skeptical viewpoints, such as this article describing in detail why most Linux systems don't support TOE. I thought that the most telling point in this document was the observation-from-history about the eternal tradeoff between custom hardware solutions and general software solutions:

Each TOE NIC has a limited lifetime of usefulness, because system hardware rapidly catches up to TOE performance levels, and eventually exceeds TOE performance levels. We saw this with 10mbit TOE, 100mbit TOE, gigabit TOE, and soon with 10gig TOE.


Also, the essay that I linked to at the start of this post, which originally got me interested in TOE, refers to this support document, which mentions that:

  • TCP/IP Offload has a problem with the Window Scaling feature. This problem typically occurs when you communicate with a Windows Vista-based computer. Windows Vista uses the Window Scaling feature.

  • Some TCP/IP Offload-enabled network adapters do not send TCP keep-alive messages. However, Exchange servers use TCP keep-alive messages to clean up inactive client sessions.

  • The TCP/IP Offload-enabled network adapter may consume lots of nonpaged pool memory. This may cause other problems in the operating system.

  • In some cases, the TCP/IP Offload-enabled network adapter may request large blocks of contiguous memory. This makes the computer stop responding when it tries to free the memory.



It seems like this is a fairly controversial bit of technology, with rather wide-ranging opinions on whether the technology is a benefit or a hindrance. Modern systems continue to become more complex, and evaluating their success or failure becomes more complex as well.

Many years ago, when I worked in the database world, we had a variety of partners who were trying to implement various bits of the DBMS technology in hardware, in the hopes that their hardware-augmented systems would outperform the pure software solutions. In the DBMS world, it seems, we learned this lesson a long time ago:

There was overwhelming sentiment that research on hardware data base machines was unlikely to produce significant results. Some people put it more strongly and said they did not want to read any more papers on hardware filtering or hardware sorting. The basic point they made was that general purpose CPUs are falling in price so fast that one should do DBMS functions in software on a general purpose machine rather than on custom designed hardware. Put differently, nobody was optimistic that custom hardware could compete with general purpose CPUs any time in the near future.


It will be interesting, now that I'm aware of it, to keep an eye on this TOE technology, and see if it suffers the same fate that the DBMS custom hardware technology did, 25 years ago.

Sunday, February 21, 2010

Virtual Execution Environments 2010

Over the past few years, I've been fascinated to watch the convergence of three major areas of modern computing research:


  • Language-interpreting virtual machines, such as the Java Virtual Machine, the Flash engine, and the .NET runtime

  • Web browser environments, with their so-called Dynamic HTML support: JavaScript, HTML 5, CSS

  • Complete system virtualization engines, such as VMWare, Xen, and VirtualBox



This year's VEE 2010 is a postcard from the bleeding edge of the convergence. The preliminary program is now online.

Simply browsing through the conference materials sets my mind aflame:

  • Development and debugging for virtual environments, such as record/replay debugging and omniscience

  • An Asymmetry-Aware Scheduler for Hypervisors

  • Holistic Memory Efficiency and Performance

  • Energy-Efficient Storage

  • Capability Wrangling Made Easy

  • A Substrate for Managed Runtime Environments



At my new day job, I'm told, each developer gets a virtual machine of his/her own for development tasks; an OVM I'm sure. It sounds great to me: who needs a physical computer when I can have a virtual one?

While virtualization is generally sold as an efficiency proposition, with its primary appeal being to bean counters, it's actually a wonderful thing for software types like me. We get increased power, increased flexibility, higher availability. I no longer have to have direct physical access to 4-5 machines of my own in order to ensure a broad range of hardware/software configurations; I no longer have to transport machines from place to place to fit my mobile work environment; I no longer have to settle on a software/hardware mix and hold it static for years. Instead, I can adjust my environment to the needs of my current project(s) and evolve them gracefully as my work needs change.

My first usage of a virtual machine was in 1985, at Computer Corporation of America in Boston, makers of the Model 204 DBMS for IBM mainframes. IBM's Virtual Machine operating system for the mainframe was, oh, 25 years ahead of its time; it's good to see the rest of the world understanding the power of this concept.

Thursday, February 18, 2010

A new progressive era is about to begin

As my eye doctor has been predicting for twenty years, now that I'm approaching age 50, I'm having increasing trouble with my near-field vision. I can still see just fine at medium distances, and my mild nearsightedness at distance hasn't changed much since college, but I can longer convince my eyes to focus on objects closer than about 18 inches away.

So, since it happened that I had some end-of-year FSA account balance left, I went and ordered a pair of new progressive lens glasses, which should be here by early March. The doctor says it may take me a few days to get accustomed to the progressive lenses, but he's hopeful I'll be much happier with them than with my current setup.

While I was there, I also decided to get a pair of prescription sports goggles (Liberty Morpheus 2's, I think), to see if they can make any improvement in my soccer play.

Jason, I hope you won't bust a gasket if both Jeff and I are out on the field wearing our goggles!

Wednesday, February 17, 2010

Column aliasing in SQL

Consider the following example:


CREATE TABLE t (c1 int);

SELECT * FROM t AS a(a1);

SELECT c1 AS a1 FROM t AS a;


In both cases, the result of the SELECT is a table named "A" with a column named "A1".

However, the syntax for specifying the column alias differs in the two cases.

And, some other similar syntax does not seem to be legal (at least in Derby):


SELECT c1 FROM t AS a(a1);

ERROR 42X04: Column 'C1' is either not in any table in
the FROM list or appears within a join specification and
is outside the scope of the join specification or appears
in a HAVING clause and is not in the GROUP BY list. If
this is a CREATE or ALTER TABLE statement then 'C1'
is not a column in the target table.

SELECT c1 AS a1 FROM t AS a(a2);

ERROR 42X04: Column 'C1' is either not in any table in
the FROM list or appears within a join specification and
is outside the scope of the join specification or appears
in a HAVING clause and is not in the GROUP BY list. If
this is a CREATE or ALTER TABLE statement then 'C1'
is not a column in the target table.


I'm not really sure what's going on with this part of the SQL language.

Normally, I have always used the form

SELECT c1 AS a1 FROM t AS a;


I only learned about the FROM t AS a(a1) form fairly recently, and I'm still trying to understand what it means and how it is to be used:

  • Can I only use it with SELECT *

  • How do I know what order the columns are to be named in

  • Why would I choose to use this form as opposed to the individual column aliasing?



So much still left to learn about SQL...

Friday, February 12, 2010

Tweeting the Vancouver games

Technology never stops evolving; everybody has to keep up.

This year, the International Olympic Committee found it necessary to post official guidelines regarding what was appropriate (and inappropriate) for athletes to post on their Twitter, Facebook, or blog pages during the Olympic Games.

Let's see, does

Just nailed that inverse 720, dude!

fit in an SMS message?

I think it does!

Thursday, February 11, 2010

Some changes in my work life

I'm actually quite excited about the implications of the recent changes in my day job. Finally, the software I've been pouring my life into for the last 8.5 years will be backed by the resources and market reach of the largest software company on the planet!

At a personal level, there will be a number of changes. I'm sad that I'll no longer get to commute to work with my wife; it's been a wonderful six year run of our shared commute. And I'm sad that I'll lose my beautiful private window office. AmberPoint's executive team is a very enlightened group who really understand the importance of a great work environment for software engineering, and I've tremendously enjoyed being able to work in Greg's team.

But overall, I'm hopeful about the future, and looking forward to learning more as the details are revealed.

Tuesday, February 9, 2010

FAT32 filesystem limits, DRDA CMDCHKRM and DERBY-3729

As part of DERBY-3729, I've been diving back into DRDA, an area I hadn't visited in several years.

DRDA is IBM's Distributed Relational Database Architecture protocol. It is the client-server protocol which was chosen for the implementation of the Apache Derby network server functionality. DRDA is extremely rich and powerful, but unfortunately it is not simple. However, it is very thoroughly documented; the complete DRDA specification is available from The Open Group.

In the particular case of DERBY-3729, the issue involves a fairly simple question:

When the server runs out of disk space, or is otherwise unable to write any more data to the database, how should it notify the client?


In general, DRDA accomodates the return of error message information from the server to the client. However, in this case, the issue is complicated because of the severity of the error. In Derby, as in (probably) all standard SQL implementations, there are a range of severity levels:

  • Warnings, which are simply returned to the client following the statement execution. For example, I think "value was truncated to fit" is a warning.

  • Errors which cause the current statement to be aborted. For example, I think "unique key constraint was violated" is a statement-level error.

  • Errors which cause the current transaction to be aborted. For example, I think "deadlock occurred and you were chosen as a victim" is a transaction-abort error.

  • Errors which cause the current connection/session to be closed. These are not very common in Derby except due to internal errors. One example, though, is when you try to connect, but give invalid arguments for the connection parameters. Then you get an error and your (never-really-created) connection is closed.

  • Errors which cause the current database to be closed. These usually involve I/O errors which are affecting Derby's storage engine. Note that since Derby can support multiple databases simultaneously, it is possible that one database is on a bad disk drive while other databases are still OK, so only the failing database is shut down.

  • Errors which cause the entire system to shut down. These are extremely rare, and mostly involve internal logic errors that are detected in critical pieces of the Derby engine. For example, if an unexpected exception occurs while aborting a transaction, Derby concludes that something is horribly wrong and shuts the entire system down rather than risking further damage.



Back to the case at hand: DERBY-3729.

In this case, it wasn't that the disk was full; rather, since the user had configured their system using the FAT32 filesystem format, individual files are limited to 4 gigabytes in size. Once a file gets that large, attempts to make it larger are rejected, with an error that turns into an IOException in the Derby storage engine. The IOException is caught and treated as a database-severity error, which results in:

  • The statement and its transaction are rolled back (if possible)

  • The session and its connection are closed

  • The database is shut down

  • The client is informed that a "command check" has occurred"



A "command check" is the DRDA message which is used when an error of such severity has occurred that it caused the connection to be closed. It is conveyed as a CMDCHKRM, which is DRDA jargon for "command check response message".

Unfortunately, the normal Derby client-server error message communication mechanism requires that the connection remain open, because generally the Derby server just sends a message "code" to the client, which then requests the full error message details from the server by making additional protocol calls back-and-forth. In this case, since the connection is closed, we only get one shot to convey any error message information, which is via the CMDCHKRM message itself.

It turns out that the CMDCHKRM message always contains a SQLCARD, which is a SQL Communications Area Response Data object, which allows a small amount of error information to be carried inline.

So, to try to resolve DERBY-3729, I:

  • Enhanced the server-side error message text to reflect that there are other possible causes of I/O errors besides the disk being full, such as a filesystem limit (FAT32) or a quota being reached.

  • Enhanced the client-side error message text to reflect that, when a CMDCHKRM is received, there may be additional information available in the server-side derby.log file.

  • Enhanced the client code which processes the CMDCHKRM message to look for the SQLCARD, and to fetch whatever summary error message text is present in that object, and include it in the client-side exception that is thrown.



Hopefully this will help the next user who runs into this problem.

I wonder how long the FAT32 filesystem format will still be in use?

Thursday, February 4, 2010

I'm sorry you had to see me like that

This is perhaps the greatest TV review of all time.

Sure, it's just TV, and it's just Lost, and it's just entertainment, but what a wonderful review!

Salman Rushdie, and Gottfried Liebnitz, and Soren Kierkegaard, and Dante!

When reviewers are saying things like this, we've moved beyond simple entertainment; we're definitely in the realm of Art:

One wonders if the entire season 6 side ways story line will model the general thematic thrust of the castaway story, but with different incidents and events — a gritty, more down-to-earth version of the mythic, larger-than-life Island epic, like how Dorothy's adventure in Oz was a fantastical extrapolation of her life in Kansas. Lost also loves its Alice in Wonderland references, and so we recall that Lewis Carroll's sequel to Alice's Adventures In Wonderland was entitled Through The Looking Glass, which begins with Alice gazing into a mirror and wondering if it could be portal into a topsy-turvy Otherworld. The book itself is a cracked mirror reflection of the previous book — the same story in essence, sharing similar if not identical themes, just rendered with different incident and detail.


I left my weekly soccer game early; as I was walking out the door, my teammates asked: where are you going?

I'm going home, I said. I made a promise to myself, that I was going to watch the last season of Lost.

And so I am.

Tracer-T!

Raymond, you really made my day!

Joel and Jeff on programmers who blog

During episode 81 of the StackOverflow podcast, Joel and Jeff got into a fairly interesting discussion about programmers who write blogs.

Joel encouraged programmers who blog to focus on quality over quantity. He said that, when you write an article, you should try to narrow your focus down to some little tidbit of information that you are particularly interested in and familiar with, and you should attempt to write the world's greatest article on that subject. He used the example of "chocolate-covered strawberries", and suggested that you shouldn't just write another article talking about some neat chocolate-covered strawberries that you made, but you should really dig deeply into the particular details that fascinate you, like the particular type of cocoa powder to use. He said, your goal should be to write the article that will be the #1 hit on Google for "chocolate-covered strawberries".

Jeff differed with Joel (he often does), and offered a number of alternate perspectives:

  • Firstly, he said, you should blog about what interests you, and not worry so much about what might interest others.

  • Secondly, he said, you should make the barrier to blogging fairly low. There are already plenty of obstacles to blogging, and if you spent all your time worrying that your writing isn't good enough, or doesn't deserve to be posted, you'll never blog anything at all

  • Thirdly, he said, writing is a skill, and like any skill it must be practiced, and so you should exercise your writing skills regularly and routinely. The more you write, the better you will get at writing, so write as often as you can.

  • Fourthly, he said, simply writing about something will, often subconsciously, cause you to improve your knowledge about it. If you blog about X, in the process of doing so, you will find that you learn more about X. Partly this is because you'll find yourself doing some additional research about X; partly this is just because blogging about X will force you to organize your thoughts and organizing your thoughts is one of the ways you learn



Essentially, Joel said: programmers who blog should blog for their readers. While Jeff said: programmers who blog should blog for themselves.

I've only been seriously blogging for about 8 months, but I find myself strongly on Jeff's side in this area. I'm not trying to be some sort of A-list world-renowed blogger; instead, I'm trying to keep track of the particular topics that I'm interested in, practice my writing skills, record notes and observations about things that I learn, etc.

So, I'm sorry, Joel, but you're never likely to get to my blog from the top of a Google search, and if you're looking for information on chocolate-covered strawberries I've got nothing to offer. But I am feeling good about the return on investment by keeping this blog, so I'm intending to continue it more-or-less as it has been, as long as I can.

Monday, February 1, 2010

Ceremony vs Essence

I was listening to one of Scott Hanselman's Hanselminutes shows the other day. I can't remember whether it was show 196 or show 197 (probably it was the IronRuby show), but in one of those shows, the conversation turned to Ceremony vs Essence.

I hadn't heard of Ceremony vs Essence before, so I've been learning about it.

Ceremony vs Essence, as I understand it, is a framework for critiquing programming languages. It appears to have been invented by Neal Ford of Thoughtworks, and delivered in a series of conference keynotes in 2008. (Unfortunately, I can't find his keynote slides available online anywhere -- anybody have a link?)

The fundamental idea of the Ceremony vs Essence idea appears to be that, all other things being equal, programming languages should attempt to allow programmers to clearly express the essence of their programs without being caught up in excessive ceremony provided by the programming language.

That is, it appears to be a form of Occam's Razor: an argument for simplicity and clarity, and for avoiding unwanted extraneous clutter.

So, for example, some people look at a statically typed language like C or Java, and observe that the need to define your variables ahead of time, and specify exactly the type of the variable, is "ceremony", and distracts from the "essence" of the use of the variable. This point is made quite well by a pair of essays by Stuart Halloway:


Clink Shank makes a similar point, from a testing perspective, in his essay: Essence over Ceremony in Unit Testing.

I think that, setting aside for a moment the language debates regarding Ruby-versus-Java-versus-JavaScript-versus-whatever, there is clearly a universal point to be made: simpler is better. Simpler is clearer, simpler means less code, simpler is easier to tune, to maintain, to document, to debug.

One of the best aspects of the code review process that occurs during the analysis of a patch proposal on the Derby project is that each successive pair of eyeballs will look at the code and, sometimes, will notice: "Say, you know, you could write this more easily if you did this, or did that, or did such-and-such".

Nobody is a perfect programmer, and no code is ever perfect. The ongoing study, refinement, and improvement of software has no end.

And is ever-enjoyable.