Most readers of this list are probably not interested in this, but I wanted to
follow up my last posting:
First, after I sent my message I realized that I had not noticed a couple of
Robin Vowel's comments that I want to respond to:
> But the code *was* changed for Ariane 5. Protection was applied to
> other similar conversions in the vicinity of the conversion that actually
> overflowed.
The report does not state when the protection was applied. I have always
assume that these decisions were made when the code was developed for the
Ariane 4. Any thoughtful person, as part of the devolopment for the Ariane 4,
would have examined the code for potential overflows, no mention was made of
having performed this review twice, and the decisions made were obviously more
applicable to the Ariane 4 than to the Ariane 5. Of course my assumption might
be incorrect and if there is any contrary information I would be glad to hear
of it.
> The addition of protection of overflow (a single instruction) could not have
> made any significant difference to the running time of the SRI computer.
By itself this argument is not conclusive. The impact of an instruction
depends on how often that context is executed and the computational resources
on rockets are more limited than one might expect. While I would be surprised
that adding this instruction would increase the computational requirements by
more than 1%, there were two other unprotected conversions, and many other
computational changes to decrease computer load. Any increase in the load
means either the potential of deadlock, or more difficulty in adding other
functionality later.
A better solution would have been to shut down this part of the software upon
launch, and to have the default handler check whether the other computer was
still operating before shutting down the computer that generated the
conversion error.
> In any case, error trapping and recovery needed to be provided for.
The potential for recovery on a launch system is very limited. If the system
is performing a useful function having it latch to a fixed value means that it
is getting only the same incorrect value. It is likely if such a value is
generated as part of the guidance system that the system will continue off
course (more gradually) and still have to be destroyed. In fact in this case
ignoring the error and not trying to recover something better might have been
appropriate as the system was not performing a useful function at the time the
error was generated. What must be done is analyze the behavior of the system
in sufficient detail that no overflow can occur for the properly functioning
system.
In my last message I mentioned Garlington's page. In searching for that page I
found it
http://www.flash.net/~kennieg/ariane.html
and another page I consider to be of even more interest
http://www.rvs.uni-bielefeld.de/~ladkin/Reports/ariane.html
I agree with Ladkin. The real problem with the mission was a problem of
requirements. There was no review to determine whether and how the Ariane 4
requirements might conflict with the Ariane 5 requirement.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|