Archive for the ‘Software’ Category.

The Mythical 5%

Bruce Eckel, author of Thinking in C++, relays a commencement address he gave.

First, some bad news:

The statistics are sobering: 50-80% of programming projects fail. These numbers are so broad because people don’t brag about their failures, so we have to guess. In any event, this makes the world sound pretty unreliable. …

Many projects that do somehow “succeed” still resemble sausage: you don’t want to see how it was made, or what’s in it.

Now, some good news:

An even more fascinating metric is this: 5% of programmers are 20x more productive than the other 95%. If this were a science, like it claims, we could figure out how to get everyone to the same level.

Differences this drastic do exist. Talent, experience and education provide raw material. But it must be cultivated.

So how do you become one of these mythical 5%?

These people are not those who can remember all the moves and have fingers that fly over the keyboard erupting system commands. In my experience those in the 5% must struggle to get there, and struggle to stay there, and it’s the process of continuous learning that makes the difference.

Because of what I do, I’ve met more than my share of these people. They read a lot, and are always ready to tackle a new concept if it looks worthwhile. I think if they do go to conferences they’re very selective about it. Most of their time is spent being productive, figuring things out.

The big issue is knowing that you’re going after that 20x productivity increase. Which means getting leverage on everything you do. Never just “bashing something out,” but using the best tools, techniques, and ideas at your disposal. Always doing your best.

…Being able to analyze and understand a situation and discover the hinge points of a problem is essential; this takes a clear mind and detached perspective. For example, sometimes the choice of programming language makes a huge difference, but often, it’s relatively unimportant. Regardless, people will still spend all their time on one decision while something else might actually have a far greater influence. Architectural decisions, for example.

Well put. Understand the problem well. Avoid the temptation to shoe-horn what seemed to work last time. Think, too: what would you do differently next time? How would you solve the problem in a more clear manner? How would you make the source code–every line–more clear?

Often, you need to “bash[] something out.” Do it in the smartest way possible, despite the pressures to deliver. Hesitating to dive in at the appropriate time is called analysis paralysis: you might need a very imperfect first cut to get past that.

One factor of the 20x difference is learning what you can live with, and fixing what you can’t. The cost of rewriting (let’s say “refactoring”) a piece of code can be painful, but there’s also a cost of living with it. It can be an agonizing call.

Gerald Weinberg … is most famous for saying “no matter what they tell you, it’s always a people problem.”

Usually the things that make or break a project are process and people issues. The way that you work on a day-to-day basis. Who your architects are, who your managers are, and who you are working with on the programming team.

An appropriate finish:

You’ll need to make a lot of mistakes in order to figure things out. So be humble, and keep asking questions.

CastleCops Program List

I get a strange message about a program crashing. I don’t recognize its name (swdsvc.exe). Is it legit or not? When I google it, I don’t know if I should trust the web-sites that come up any more than the mysterious executable. What to do?

CastleCops keeps very valuable lists of programs and things you’d expect to find on your PC, and those you don’t.

Hint: Use the search box. There are a lot of entries.

(As it turns out, swdsvc.exe is SpyWareDoctor.)

Update 1/26/2009: Castlecops’ useful information is now at SystemLookup.com.

Automating Embarrassment

I recently received this memo:

…unbeknownst to us as it only affected certain files and only a few [investors] … our data was unexplainably fatally contaminated. We have worked diligently, along with our software vendor, to fully correct the problem.

I can’t read that without feeling the sting of being in their shoes.

Whose fault was it, the software vendor or the user? Does it matter? This software vendor’s client bears the brunt of this embarrassment with their clients. That’s everybody’s problem.

I won’t name names: I’m not out to further embarrass them.

My point isn’t to gawk, but hear the clarion call to vigilance:

  • The design must be sound.
  • The implementation must be bullet-proof.
  • The code must be as clear as possible.
  • No undefined behavior.
  • No mysteries.
  • Find a problem? (Even with the design?) Fix it. Now.
  • Are users doing things you don’t expect? Resolve it.
  • Do your tests cover what could really happen?

The pressure to deliver is constant, but has to be weighed against what you’re really delivering.

One of modern computing’s founding fathers says:

Computing’s central challenge, viz. “How not to make a mess of it,” has not been met.

—Edsger Dijkstra (November, 2000)

Who Manages the Managed Language?

Managed languages like Java and C# try to help us by tracking and freeing our memory for us. Sounds good, but I don’t like it.

Maybe my problem is philosophical. To me, it seems you now have a butler quietly following you around cleaning up after you. When you’re done with something, you drop it and move on (like a kid dropping a toy on the floor when he loses interest): the butler sees it and puts it away. You and the butler don’t talk.

It breeds a reliance that I don’t think is good.

I don’t think there’s any substitute for the programmer managing his/her own resources, including memory. You bring a “managed language” into the picture to make your life easier, but now you’re troubleshooting very subtle ways you’re interacting with it.

Case in point:

DARPA Grand Challenge team member Bryan Cattle describes a nasty memory problem that cost them a shot at the $2 million prize.

Actually, most of our code is written in garbage-collected C#, so it wasn’t a memory leak per se, but it wasn’t until two weeks later that we discovered the true problem.

It was the closest thing to a memory leak that you can have in a “managed” language. C# manages your memory for you by watching the objects you create. When your code no longer maintains any reference to the object, it automatically gets flagged for deletion without the programmer needing to manually free the memory, as they would need to do in C or C++.

Resource problems are ugly. Eventually the system thrashes and dies:

We kept noticing that the computer would begin to bog down after extended periods of driving. This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. The computer performance would just gradually slow down until the car just simply stopped responding, usually with the gas pedal down, and would just drive off into the bush until we pulled the plug.

The money quote, emphasis mine:

We looked through the code on paper, literally line by line, and just couldn’t for the life of us imagine what the problem was. It couldn’t be the list of obstacles: right there was the line where the old obstacles got deleted.

Murphy’s law torpedoes the work-around:

Because we didn’t know why this problem kept appearing at 40 minutes, we decided to set a timer. After 40 minutes, we would stop the car and reboot the computer to restore the performance.

On race day, we set the timer and off she went for a brilliant 9.8 mile drive. Unfortunately, our system was seeing and cataloging every bit of tumbleweed and scrub that it could find along the side of the road. Seeing far more obstacles than we’d ever seen in our controlled tests, the list blew up faster than expected and the computers died only 28 minutes in, ending our run.

Memory leaks can be deadly subtle without the managed language. Maybe in the final analysis the managed language wasn’t technically to blame. But it clouded the picture, and their reliance on it seemed to be the core of the problem.

I’ve spoken with a few people who manage large Java projects. They usually wind up restarting the virtual machine(s) on a regular basis, as its memory footprint seems to grow unbounded. Or they try to assert more and more control over garbage collection, usually to control performance.

Maybe I’m not being fair. After all, how many potential bugs have managed languages averted? There’s no way to measure. Nor can we measure the bugs like the one described above.

Via Slashdot.

P.S.: I work with managed languages whenever my clients request it, and realize they offer more than just memory management.

Not even Google

From the Google code blog, emphasis mine:

In 2005 we launched Google Code to provide a home for our developer and open source programs. Two years, dozens of new products and new programs, and one major redesign later, Google Code is bigger and more dynamic than ever.

Two years operating and they’ve redesigned it once already.

I don’t point this out to embarrass Google but to show that redesigns are necessary from time to time. No one is omniscient, not even Google. As they better understand their mission, direction, operations, or issues, they find that what they had designed is no longer sufficient. And it’s not to be shoehorned or just endured, but fixed. Redesigned if need be.

The decision to redesign can be agonizing; the time required painful; the expense daunting. It’s not a decision to be taken lightly. It’s tempting to blame yourself, saying if only I’d have seen a little further into the future. But we’re finite beings: we don’t stand a chance. There’s no shame in that.

In fact, I think one predictor of a project’s success is how willing people are to dive in and fix things instead of trying to live with real problems.

It might take all the courage you can muster to bring it up, but you owe it to yourself and your project to give your honest assessment.

Getting your mind around it

Paul Graham gives an excellent inside look at programming. He calls it “Holding a Program in One’s Head”:

A good programmer working intensively on his own code can hold it in his mind the way a mathematician holds a problem he’s working on. Mathematicians don’t answer questions by working them out on paper the way schoolchildren are taught to. They do more in their heads: they try to understand a problem space well enough that they can walk around it the way you can walk around the memory of the house you grew up in. At its best programming is the same. You hold the whole program in your head, and you can manipulate it at will.

I call this getting your mind around it. I don’t try to hold the whole program in my head, but rather its key objects: how they should interact; which features matter; which objects handle which requirements[1].

That’s particularly valuable at the start of a project, because initially the most important thing is to be able to change what you’re doing. Not just to solve the problem in a different way, but to change the problem you’re solving.

Your code is your understanding of the problem you’re exploring. So it’s only when you have your code in your head that you really understand the problem.

You labor to intimately understand the entire problem, even though it’s impossible. So (as he says) you must mentally explore various directions.

It’s not easy to get a program into your head. If you leave a project for a few months, it can take days to really understand it again when you return to it. Even when you’re actively working on a program it can take half an hour to load into your head when you start work each day. And that’s in the best case. Ordinary programmers working in typical office conditions never enter this mode. Or to put it more dramatically, ordinary programmers working in typical office conditions never really understand the problems they’re solving.

Distractions can be deadly, though I wouldn’t put it this dramatically. Programmers working in “typical office conditions” do need to get creative to carve out the mental resources they need.

Oddly enough, scheduled distractions may be worse than unscheduled ones. If you know you have a meeting in an hour, you don’t even start working on something hard.

Absolutely.

Sometimes when you return to a problem after a rest, you find your unconscious mind has left an answer waiting for you.

True.

Harness the power.

The more succinct the language, the shorter the program, and the easier it is to load and keep in your head.

Today’s C++ compiler is an incredibly powerful tool. Use it. Exploit it. Harness its power. Its goal is to help you solve the problem, managing details and catching many kinds of mistakes along the way. Let it.[2]

Your code’s values. (Clarity, clarity, clarity.)

You can magnify the effect of a powerful language by using a style called bottom-up programming, where you write programs in multiple layers, the lower ones acting as programming languages for those above. If you do this right, you only have to keep the topmost layer in your head.

“Programming languages” sounds a bit cryptic to my ears. I prefer building blocks. The lower layers should be good building blocks for the higher ones.

Your code should be “values-oriented:”

  • Your first responsibility: make your caller’s code clear.
  • Your second responsibility: be as clear as possible yourself.

Trading a lower layer’s clarity for a higher layer’s is almost always the right thing to do. Your caller’s code should read like pseudo-code, marching as clearly through its sequence as possible. Same’s true for you.

How would you do it differently? (Make it so!)

Paul writes:

Keep rewriting your program. Rewriting a program often yields a cleaner design. But it would have advantages even if it didn’t: you have to understand a program completely to rewrite it, so there is no better way to get one loaded into your head.

It almost sounds like he’s endorsing rewriting just to warm up your fingers, but that’s not his point.

A delicate issue requiring sober judgment. There’s an obvious cost to rewriting code (your time). But there’s a cost to living with a piece of code, too (and it’s open-ended and perhaps very painful). As you learn better what it needs to do, you may need to radically change it. My bias, like Paul’s, is to make those radical changes. (And don’t blame yourself for lacking omniscience.) Yes, your boss may have a heart attack, but it could save him from one too.

I’ve rambled enough. Give Paul’s piece a read.

Via Bruce

Notes:

[1] — He agrees, saying later, “If you [build your code] right, you only have to keep the topmost layer in your head.”

[2] — I don’t mean to slight your favorite language.

xcopy vs. rsync

xcopy vs. rsync, rsync vs. xcopy

Though rsync has capabilities that Win32’s xcopy only dreams of, how do the two stack up when compared apples to apples?

My test: synchronize a large collection of files, between two different local disks. 8.19 GB of data in 11,072 files across 182 directories.

My platform: a Dell Optiplex 740, AMD Athlon 64 X2 Dual Core 5200+, 2.61 GHz, 4GB RAM, Windows XP SP 2, latest updates applied.

I’m running rsync under CygWin, version 2.6.9. That rsync was written for Unix I don’t think handicaps it. But being forced to work through the CygWin DLLs just might. No networking or data compression, as that’s unnecessary here and would only slow it down.

Command lines:
    xcopy C:\TestSrc F:\tmp\xctest1 /D /E /C /I /Q /H /R /K /O /Y
    C:\Software\Open\CygWin\bin\rsync -q -a -r /cygdrive/c/TestSrc /cygdrive/f/tmp/rstest1

Results:

Building the directory from scratch:
	xcopy:	4:59.42
	rsync:	6:11.95
Updating one file somewhere in the directories:
	xcopy:	1.70 sec
	rsync:	2.98 sec
No files to update:
	xcopy:	1.33 sec
	rsync:	2.22 sec
Updating three files somewhere in the directories:
	xcopy:	1.25 sec
	rsync:	2.78 sec

These numbers don’t take into account the fact that XP caches the directory entries off the disk the first time they’re referenced. That operation penalizes the first operation (xcopy or rsync) by 10 seconds.

Conclusion: Though xcopy moves data 25% faster than rsync on its native Win32, rsync keeps up in all other respects. In the typical case (for me) where some small subset of files has changed, they’re neck and neck. So using rsync instead of xcopy wouldn’t put me at a performance disadvantage.

Keep in mind that dealing with CygWin’s paths aren’t for the faint of heart.

This is a quick and dirty benchmark. No averaging or further exploration than what you see above, though the numbers seemed consistent across a few runs.

An impressive performance by both rsync and CygWin.

Hope it helps.

[8/18/2008] P.S.: Important: Does your xcopy seem slow? Having it output each file’s name slows it down by orders of magnitude. Use /Q to make it run silently, once you’re convinced it’s doing the right thing.

See also the comments below.

Where are the Good Programmers?

Frank Wiles discusses hiring programmers.

  • Finding good programmers is hard in any language. And that a good programmer can be as effective as 5-10 average programmers.
  • You don’t need to hire an expert in language X, you can and should look for expert programmers that are willing to learn language X. An expert can easily cross over from being a novice in any language in a matter of a few weeks.

More:

What is an expert programmer?

Experience is key, but not necessarily in ways you might imagine. Time in the saddle, with a particular language is not as important as diversity of experience. Someone who has worked in several disparate industries, a generalist, is often a much better developer than one who has spent years in the same industry. There are exceptions to this, but in general I have found this to be the case.

If hiring and managing software developers is something you do, the article is well worth the read.

Via Slashdot

Unix Man Pages

Not man as in manly/male, but manual. You know, user’s guide.

Here are a few on-line “man-page” repositories that are pretty useful. I just came across SoftwarePlug.com now, and it looks like a best-of-breed resource. Man and info pages for a slew of operating systems, and not a gooogle ad in sight.