Following the IOD and the radiation test campaign, improvements were made to the software and hardware of the NPT30-I2. To reduce the SET propagation on analog readings, the number of values to be averaged for critical sensors have been increased. In addition, the software has been improved to be able to discard wrong readings. This section presents two other major improvements done on the propulsion system: the SET and SEU protections.
A. Single Event Latch-up Protection
The most concerning issue detected during the radiation testing is the SEL inside the MB and FCU microcontroller. If not addressed quickly enough, SEL can damage the system. One way to detect the SEL is by the rapid increase of the temperature of the microcontroller.
A soft reset is not enough to recover from the effect, the microcontroller must be power cycled. For the FCU failure, there is a straightforward way to fix the SEL as it is possible to power cycle subsystems microcontroller, using the dedicated power switches on the MB as soon as the event has been detected. However, in the hardware version of the MB used for the radiation testing, the microcontroller, the system relies on the power source to power cycle the NPT30-I2.
A new version of the MB has been designed to allow MB to power cycle itself. Instead of always enabling the 3.3V regulator, the ON/\(\overline{\text{O}\text{F}\text{F}}\) pin is connected to two pulse generators in series, as shown Fig. 13. The first pulse generator “A”, connected to the “CPU_ALIVE” signal, has a pulse set around 0.1s. The second “B” is set to approximately 5s.
By default, when the program is running, it feeds the "CPU_ALIVE" line with a pulse-width modulation (PWM) signal, with a period set to 50ms. This PWM signal keeps triggering the pulse generator A, maintaining its output signal low, thus the output of the pulse generator B and the ON/\(\overline{\text{O}\text{F}\text{F}}\) pin high. This way, the 3.3V regulator is enabled.
When the microcontroller detects a SEL, it saves the operation context in the non-volatile memory (NVM), and disables the PWM signal, as shown in Fig. 14.
After 100ms, the pulse generator A timeout, and its output goes high. This rising edge triggers the pulse generator B, which drives its output low. This disables the 3.3V regulator and turns off the microcontroller. After 5 seconds, the pulse generator B output goes back to high state, which turns the microcontroller back on. When the MB is restarted, it checks the previous firing and reset context, and resumes firing if needed.
B. Bootloader Improvement
Data and code are stored within NVMs, directly on the microcontroller’s flash memory, which is sensitive to radiation effect [7], especially for the COTS version of the microcontroller. The NPT30-I2 MB and subsystems implement a bootloader that greatly enhances the in-flight debugging capability and allows the system to be completely reprogramed in the event of corruption of the application or if the firmware needs to be modified for any reasons. The architecture of the MB and subsystem memory is shown in Fig. 15.
When a NPT30-I2 board is powered on, the microcontroller starts in bootloader mode. By default, it checks the user application integrity, running a secure hash algorithm (SHA) of the application’s memory area and comparing the result with the hard-coded value, which has been set during the firmware upload. If the two hashes are equal, the microcontroller branch into the application. If the hashes do not match, or if the user requested to stay in bootloader mode, the microcontroller proceeds to the bootloader, where the whole memory of the application can be reprogrammed.
However, the bootloader itself may be corrupted. This could remove the ability to update the application, or even prevent the microcontroller from branching into the application, making the propulsion system unusable. To decrease the probability of bootloader corruption, its architecture has been structured, as shown in Fig. 17.
The improved bootloader is composed of 3 levels: the previous bootloader, boot level 3, and two intermediate levels. The first level is a small and non-redundant code that is used to randomly branch into one of the copies of the level 2, 2a or 2b. This first level consists of only a few dozen instructions, for 116 bytes of program memory, which highly reduce the risk of corruption.
The second level of the bootloader is used to build the third level, by running a triple voting algorithm on the voters a, b, and c, and writing the result to the level 3 result memory area. The second level of the bootloader has more instructions than the first level, with approximately 3800 bytes of program memory, but has a “same design redundancy” with two identical codes sections 2a and 2b. If a corrupted version of boot level 2 is selected by the first level and the program gets into a deadlock, the watchdog timer triggers a restart after few seconds. The first level of the bootloader will eventually select the uncorrupted version of boot level 2. The safety lies in the fact that the probability of corrupting both copies of boot level 2 is low.
After the triple-voting algorithm, the bootloader branch into boot level 3, where it is possible to reprogram the application memory area, as in the previous version of the bootloader. The boot level 3 memory area is much bigger than the first two levels of the bootloader, with around 30000 bytes of program memory, and is more prone to memory corruption. The triple modular redundancy ensures the integrity of the third level by correcting every single bit of corruption, as shown in Fig. 18.
To corrupt a bit in the boot level 3 result, the output of the triple-voting algorithm, the same bit must be corrupted on two different voters, which reduces the probability.
The flowchart Fig. 19 sums up the three-level improved bootloader.
This new three-level bootloader has been extensively tested. Thanks to the new architecture and multiple redundancies, the microcontrollers should be more resilient to radiation and SEUs.