When the Curiosity rover successfully landed on mars, the engineers at JPL became technology's new rock stars.
It was an amazing accomplishment, and all the people on the Mars Science Laboratory team at JPL deserve accolades.
From my perspective, as a long-time observer of the engineering software industry, I imputed a lot of their success to a combination of smart people, good tools, and strong processes.
Yet, something has been bothering me about Curiosity. In a word: Teflon.
A month or so before Curiosity landed, there were articles in a number of publications about a small problem with dirt samples potentially becoming contaminated with polytetrafluoroethylene (Teflon) from the seals on the rover's drill.
According to an article at the Christian Science Monitor, NASA discovered the problem shortly before the November 26, 2011 launch of the Rover. The problem hit the media about June 12 this year--about six months after it was discovered. I'm guessing that the announcement of the problem was timed to avoid overshadowing the excitement of Curiosity's landing.
It's not the problem that bothers me. It's the question of how NASA missed the problem, until it was too late to do anything about it.
Two things that NASA does really well really well are simulation, and verification and validation (checking that a system meets its requirements and specifications, and that it fulfills its intended purpose.)
Could it be that NASA tripped up on simulation, or verification and validation?
That was my initial guess. But, last week, I had a chance to talk to Doug McCuistion, NASA's Director of the Mars Exploration Program, and ask him about the Teflon problem.
McCuistion pointed out the major factor that that contributed to the late discovery of the problem: in the lead-up to launching the Mars Science Laboratory project, the drill was redesigned a couple of times, based on what was learned from the earlier Mars Spirit rover. It was in life testing of the final design that the potential for Teflon wear was first discovered. And, since the redesigns pushed back life testing, the problem wasn't discovered until it was too late to fix it before launch.
NASA, of course, could have delayed the launch of the Mars Science Laboratory, in order to address the problem. The only problem with doing so is that the Earth and Mars are only aligned properly for such a mission once every 26 months. A delay would have cost about $700 million.
What NASA was able to do, prior to the launch, was analyze the problem, to see if it was significant. Teflon, after all, is an organic compound, and its presence could distort soil analyses. In short, NASA scientists determined that they could “read through” any contamination. The problem was manageable; no need to delay a mission that already cost $2.5 billion.
According to McCuistion, when a program's budget, schedule, and function are all fixed, the only variable is risk. NASA happens to understand risk quite well, and actively manages it.
While simulation may be critical to achieving function, it also serves to “buy down” risk. In the case of Curiosity, simulation worked great for getting the rover onto the surface of Mars, and making it capable of meeting its functional requirements. Where it didn't work perfectly was in predicting wear. It took life-testing to do that.