I've worked with the government in the past. Their compute systems definitely ar...

jvanderbot · on June 28, 2023

To add, the reason most computers are behind is usually a mentality of "if it aint broke dont fix it"

Same thing in spaceflight. The most recent mars rover uses, essentially, a half-speed imac processor from 1990s.

kube-system · on June 28, 2023

Spaceflight has an inherently different set of requirements from the types of applications people on here mostly build. NASA doesn't need to run 2GB of javascript dependencies on their rover. They have people who can write lean real-time code in a low level language. They need to be sure a bit flip doesn't turn a 3 billion dollar project into rubble. There is no 'fix it in the next sprint' in spaceflight; after you hit the red button, it must work. The RAD750 that Perseverance runs on has some impressive specs, they are just specs that don't matter in a datacenter.

godelski · on June 28, 2023

Fun story about that. I knew the lead driver for Curiosity and he was sharing a story about how not long after they got off Mars time they were driving it up a slope. Before he left for work that day he quickly thought to add a stopping routine in case the rover slide down the hill because the ground wasn't stable. Came back in the morning and they were worried they crashed the rover because the camera was facing a rock. Was a terrifying 30 minutes-ish moving the camera and checking that everything was okay. The routine saving the rover. (I'm sure the story is a bit exaggerated, but still fun and does demonstrate the high risk these systems have. Even if they only move a 0.15km/hr)

jvanderbot · on June 28, 2023

It's quite amazing to see that the mars rover has what you'd call self-driving capabilities, complete with onboard mapping & localization, yet without GPS and any kind of reasonable computer.

However, they did create an FPGA to do the stereo vision in real time, which is pretty cool too.

vel0city · on June 28, 2023

I used to know a number of NASA programmers out of Houston. They did a lot of Java programming on some pretty hyper-optimized and specified JVMs. They liked it because it would be easier for them to run simulations on and could achieve good portability on higher level things.

That was like 20 years ago though. Things could easily be different now.

alephnerd · on June 28, 2023

I think Akuna Capital (or one of the other Chicago HFTs) is using a JVM based stack as well.

High Performance Java is definetly a thing - it just takes a lot of tuning.

godelski · on June 28, 2023

Exactly. If anyone is going to work _with_ the government, you should be aware of the Technology Readiness Level (TRL)[0]. It's like the cornerstone of how everything operates. It takes decades to get to level 9 sometimes, and when reliability is critical that's what they use. Not only is it "if it ain't broke, don't fix it", but "it ain't broke, and we understand every single component of this system and everything that can go wrong with it, don't 'fix' it."

It can also take a lot of time for something to get from TRL 6 to TRL 7, and from TRL 7 to TRL 8. Sometimes realistically impossible to get to TRL 9! The system could be more efficient for sure, and probably has broken down too much, but the idea itself is fine.

Fwiw, I've seen things that I 100% believe should be done today and that there's massive amounts of evidence for it, but aren't done because they aren't TRL 9. I don't want to start arguments, but there's a certain industry that gets hit with this all the time. Where they can do tons of complex and detailed models but aren't able to build the actual thing because it hasn't been physically demonstrated. Which leads to a weird self referential roadblock.

[0] https://en.wikipedia.org/wiki/Technology_readiness_level

ch4s3 · on June 28, 2023

> The most recent mars rover uses, essentially, a half-speed imac processor from 1990s.

Isn't this because they need to be specified, texted, and certified to work under high levels of radiation, extreme temperature swings, and a tiny power envelope determined years in advance?

godelski · on June 28, 2023

Not everything needs to meet those specific standards, but essentially yes. My sibling comment mentioned Ingenuity (Mars Helicopter), but let's mention Curiosity. It often gets cited as costing $2.5 billion dollars. So that alone says why you wouldn't want to risk anything (see my other comments, especially about TRL[note]). But that also took almost a decade![other note] Time is also very expensive, so how much would you risk? On a 10 year project, that is going to operate for another 12+ years (still going, landed in 2011) with no way to repair or fix the product, you probably want that thing to be reliable as shit. Worth an extra year or two to make that happen.

[note] TRL 9 pretty much doesn't exist in these types of missions. We don't have decades of operation of specific devices on other planets. But you still use very robust and redundant systems.

[other note] People often cite this as a waste of money, but when we consider the time it is pittance in the government budget. Costs could definitely come down, but in government money can only exchange hands through leaky buckets. NASA politically prides itself as having parts from every state, which from an engineering perspective should sound like a logistic nightmare, and it is. But this is the type of people you are voting for, people that love pageantry over pragmatism. (pageantry gets votes, pragmatism doesn't)

AlexandrB · on June 28, 2023

Not sure if true, but I also recall reading that ICs made with larger processes are more resilient to cosmic ray strikes. So modern 7nm chips might have an inherent disadvantage in these applications.

godelski · on June 28, 2023

This is true. You can think about this fairly simply as you just have to think about the amount of energy traveling within a transistor and what that energy's proportionate level is to a cosmic ray. Smaller IC's are more likely to be hit (cross-section is higher due to density AND that we operate in fields, no need for a direct physical interaction of the mass).

But that's not why they cost so much. Rad hardened ICs are built differently. They are built on a sapphire base: Silicon on Sapphire (SoS). Process is more complicated and expensive.

That said, there are more people looking at commercial off the shelf (COTS) and just building redundant systems because 2 COTS CPUs can be cheaper (and more powerful) and because ECC has gotten much better. But this is really only for LEO right now, but may be used in deeper space missions later on. Realistically it is just going to be dependent on how much your ride costs. LEO is cheap now, so risk of failure is dramatically reduced. Your ride is still most of your cost though.

jvanderbot · on June 28, 2023

Well, the mars helicopter essentially uses a cell phone, so not really.

And it's not that low power.

But yeah, the mentality was: We know this works, it's too costly to guarantee a new one will work, and too risky to just try it. Same in the military. Imagine the military didn't have radiation, temperature, etc requirements

heavyset_go · on June 28, 2023

AFAIK the helicopter wasn't mission critical, it was a nice-to-have demo project that piggybacked on Perseverance's mission.

ch4s3 · on June 28, 2023

From what I understand the helicopter was a special case and sort of a side mission meant to test off the shelf parts.

jvanderbot · on June 28, 2023

Sure, but those COTS parts still underwent lots and lots of rad/ vac testing. There's a design-for-resilience vs test-for-resilience tradeoff at play. I'm saying that not all systems require design-for-resilience.

giantrobot · on June 28, 2023

Ingenuity is also running a very customized software stack to go along with the highly tested hardware. IIRC the OS can reboot in a some relatively small number milliseconds. It's meant to be able to crash in mid-air and reboot and recover before the probe loses lift and crashes in most flight envelopes. The software is also set up to crash and reboot rather than try to recover or operate in a compromised state.

jvanderbot · on June 29, 2023

Yeah, I suspect we were both in or close to 347 at the time this was getting made.

godelski · on June 28, 2023

Well to nitpick, that's because the two are kinda the same. Testing is part of design and vise versa in these types of systems. But otherwise I agree with the points you've made (seems we've been responding in parallel lol)

dekhn · on June 28, 2023

No, that's not the reason space computers are behind.

Space computers are behind because space technology is based on minimizing risk given the large cost of failure. Using older nodes with well-understood flaws that can be built using old chip equipment is much lower risk than trying to put Intel's latest into a communications satellite.

Few things in space truly would require anything state of the art. The best argument I can think of would be realtime image processing using DNNs- imagine you made a fleet of 1M identical exploration bots and you wanted them to filter out most of their imaging data before sending back the best candidates.