I really don't get why they decided to choose usb-b-3.0 and usb-micro-b-2.0 female for this? Usb-c is so much cheaper and more common at this point. Why bother with using such old plugs, especially so when one plug could do the job of 2?
It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.
This is what they are trying to create, more specifically:
It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
An old concept indeed! I think about this Ed Fredkin story a lot... In his words:
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).
Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.
I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.
The article linked by you uses magnetic tunnel junctions for implementing the RNG part.
The Web site of Extropic claims that their hardware devices are made with standard CMOS technology, which cannot make magnetic tunnel junctions.
So it appears that there is no connection between the linked article and what Extropic does.
The idea of stochastic computation is not at all new. I have read about such stochastic computers as a young child, more than a half of century ago, long before personal computers. The research on them was inspired by the hypotheses about how the brain might work.
Along with analog computers, stochastic computers were abandoned due to the fast progress of deterministic digital computers, implemented with logic integrated circuits.
So anything new cannot be about the structure of stochastic computers, which has been well understood for decades, but only about a novel extremely compact hardware RNG device, which could be scaled to a huge number of RNG devices per stochastic computer.
I could not find during a brief browsing of the Extropic site any description about the principle of their hardware RNG, except that it is made with standard CMOS technology. While there are plenty of devices made in standard CMOS that can be used as RNG, they are not reliable enough for stochastic computation (unless you use complex compensation circuits), so Extropic must have found some neat trick to avoid using complex circuitry, assuming that their claims are correct.
However I am skeptical about their claims because of the amount of BS words used on their pages, which look like taken from pseudo-scientific Star Trek-like mumbo-jumbo, e.g. "thermodynamic computing", "accelerated intelligence", "Extropic" derived from "entropic", and so on.
To be more clear, there is no such thing as "thermodynamic computing" and inventing such meaningless word combinations is insulting for the potential customers, as it demonstrates that the Extropic management believes that they must be naive morons.
The traditional term for such computing is "stochastic computing". "Stochastics" is an older, and in my opinion better, alternative name for the theory of probabilities. In Ancient Greek, "stochastics" means the science of guessing. Instead of "stochastic computing" one can say "probabilistic computing", but not "thermodynamic computing", which makes no sense (unless the Extropic computers are dual use, besides computing, they also provide heating and hot water for a great number of houses!).
Like analog computers, stochastic computers are a good choice only for low-precision computations. With increased precision, the amount of required hardware increases much faster for analog computers and for stochastic computers than for deterministic digital computers.
The only currently important application that is happy even with precisions under 16 bit is AI/ML, so trying to market their product for AI applications is normal for Extropic, but they should provide more meaningful information about what advantages their product might have.
people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
ugly.
at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
it costs so little to add artistic flair to a product, its really a shame fewer companies do
When I was a child, I was so enchanted by the look of the Cray supercomputers of old with their in-built furniture and great open arrays of status indicators. There is really something to making a machine show you the wonder of creation it unlocks
It looks super cool. I feel like I'm watching cyberpunk come to life with the way we're talking about technology these days, but this also looks straight out of the Neuromancer of my imagination.
I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.
That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.
This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.
> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,
Is it?
The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
Ah, more insults. This will be my final reply to you.
I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.
No, they're not showing just math and circuits, they're also showing a very splashy and snazzy front page which makes all kinds of vague, exciting sounding claims that aren't really backed up by the very boring (though sometimes useful) application of that math and circuits (neat how the design of those circuits may be).
If this was just the paper, I'd say 'cool area of research, dunno if it'll find application though'. I'm criticizing the business case and the messaging around it, not the implementation.
Two important questions I think illustrate my point:
1) The paper shows an FPGA implementation which has a 10x speedup compared to a CPU or GPU implementation. Extropic's first customer would have leapt up and started trying to use the FPGA version immediately. Has anyone done this?
2) The paper shows the projected real implementation being ~10x faster than the FPGA version. This is similar to the speedup going from an FPGA to an ASIC implementation of a digital circuit, which is a standard process which requires some notable up-front cost but much less than developing and debugging custom analog chips. Why not go this route, at least initially?
The fact that they show a comparison with an FPGA is a red flag, because large scale generative AI is their biggest weakness.
FPGAs are superior in every respect for models of up to a few megabytes in size and scale all the way down to zero. If they are going for generative AI, they wouldn't even have bothered with FPGAs, because only the highest end FPGAs with HBM are even viable and even then, they come with dedicated AI accelerators.
I don't know about your earlier point, but those questions are perfectly reasonable and a springboard for further discussion. Yes, that's where the rubber hits the road. That's the way to go.
If Extropic (or any similar competitor) can unlock these hypothetical gains, I'd like to see it sorted out asap.
A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost
To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.
I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.
I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said
The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.
there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]
That's what they're trying to do, yeah. To give off a cool vibe I mean. To raise more money. There is nothing even remotely as cool in their real (or not) product. I was very excited when they started specifically because of their cool branding, but the vibe quickly wears off.
The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.
The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.
Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.
Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...
What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)
Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :
1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.
I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow
How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?
I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.
Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.
Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.
No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.
Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.
This looks really amazing if not unbelieveable to the point where it is almost too good to be real.
I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.
I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.
I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.
i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.
my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.
they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.
It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.
Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it
I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).
I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
I really don't get why they decided to choose usb-b-3.0 and usb-micro-b-2.0 female for this? Usb-c is so much cheaper and more common at this point. Why bother with using such old plugs, especially so when one plug could do the job of 2?
It is a hardware RNG they are building. The claim is that their solution is going to be more computationally efficient for a narrow class of problems (de-noising step for diffusion AI models) vs current state of the art. Maybe.
This is what they are trying to create, more specifically:
https://pubs.aip.org/aip/apl/article/119/15/150503/40486/Pro...
It's not just a "hardware RNG". An RNG outputs a uniform distribution. This hardware outputs randomness with controllable distributions, potentially extremely complex ones, many orders of magnitude more efficiently than doing it the traditional way with ALUs. The class of problems that can be solved by sampling from extremely complex probability distributions is much larger than you might naively expect.
I was skeptical of Extropic from the start, but what they've shown here exceeded my low expectations. They've made real hardware which is novel and potentially useful in the future after a lot more R&D. Analog computing implemented in existing CMOS processes that can run AI more efficiently by four orders of magnitude would certainly be revolutionary. That final outcome seems far enough away that this should probably still be the domain of university research labs rather than a venture-backed startup, but I still applaud the effort and wish them luck.
An old concept indeed! I think about this Ed Fredkin story a lot... In his words:
"Just a funny story about random numbers: in the early days of computers people wanted to have random numbers for Monte Carlo simulations and stuff like that and so a great big wonderful computer was being designed at MIT’s Lincoln laboratory. It was the largest fastest computer in the world called TX2 and was to have every bell and whistle possible: a display screen that was very fancy and stuff like that. And they decided they were going to solve the random number problem, so they included a register that always yielded a random number; this was really done carefully with radioactive material and Geiger counters, and so on. And so whenever you read this register you got a truly random number, and they thought: “This is a great advance in random numbers for computers!” But the experience was contrary to their expectations! Which was that it turned into a great disaster and everyone ended up hating it: no one writing a program could debug it, because it never ran the same way twice, so ... This was a bit of an exaggeration, but as a result everybody decided that the random number generators of the traditional kind, i.e., shift register sequence generated type and so on, were much better. So that idea got abandoned, and I don’t think it has ever reappeared."
RIP Ed. https://en.wikipedia.org/wiki/Edward_Fredkin
And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:
https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
It's funny because that did actually reappear at some point with rdrand. But still it's only really used for cryptography, if you just need a random distribution almost everyone just uses a PRNG (a non-cryptographic one is a lot faster still, apart from being deterministic).
Generating randomness is not a bottleneck and modern SIMD CPUs should be more than fast enough. I thought they’re building approximate computation where a*b is computed within some error threshold p.
Generating enough random numbers with the right distribution for Gibbs sampling, at incredibly low power is what their hardware does.
I think that's underselling it a bit, since there's lots of existing ways to have A hardware RNG. They're trying to use lots and lots of hardware RNG to solve probabilistic problems a little more probabilisticly.
I tried this, but not with the "AI magic" angle. It turns out nobody cares because CSPRNGs are random enough and really fast.
https://en.wikipedia.org/wiki/Lavarand
The article linked by you uses magnetic tunnel junctions for implementing the RNG part.
The Web site of Extropic claims that their hardware devices are made with standard CMOS technology, which cannot make magnetic tunnel junctions.
So it appears that there is no connection between the linked article and what Extropic does.
The idea of stochastic computation is not at all new. I have read about such stochastic computers as a young child, more than a half of century ago, long before personal computers. The research on them was inspired by the hypotheses about how the brain might work.
Along with analog computers, stochastic computers were abandoned due to the fast progress of deterministic digital computers, implemented with logic integrated circuits.
So anything new cannot be about the structure of stochastic computers, which has been well understood for decades, but only about a novel extremely compact hardware RNG device, which could be scaled to a huge number of RNG devices per stochastic computer.
I could not find during a brief browsing of the Extropic site any description about the principle of their hardware RNG, except that it is made with standard CMOS technology. While there are plenty of devices made in standard CMOS that can be used as RNG, they are not reliable enough for stochastic computation (unless you use complex compensation circuits), so Extropic must have found some neat trick to avoid using complex circuitry, assuming that their claims are correct.
However I am skeptical about their claims because of the amount of BS words used on their pages, which look like taken from pseudo-scientific Star Trek-like mumbo-jumbo, e.g. "thermodynamic computing", "accelerated intelligence", "Extropic" derived from "entropic", and so on.
To be more clear, there is no such thing as "thermodynamic computing" and inventing such meaningless word combinations is insulting for the potential customers, as it demonstrates that the Extropic management believes that they must be naive morons.
The traditional term for such computing is "stochastic computing". "Stochastics" is an older, and in my opinion better, alternative name for the theory of probabilities. In Ancient Greek, "stochastics" means the science of guessing. Instead of "stochastic computing" one can say "probabilistic computing", but not "thermodynamic computing", which makes no sense (unless the Extropic computers are dual use, besides computing, they also provide heating and hot water for a great number of houses!).
Like analog computers, stochastic computers are a good choice only for low-precision computations. With increased precision, the amount of required hardware increases much faster for analog computers and for stochastic computers than for deterministic digital computers.
The only currently important application that is happy even with precisions under 16 bit is AI/ML, so trying to market their product for AI applications is normal for Extropic, but they should provide more meaningful information about what advantages their product might have.
If you want to understand exactly what we are building, read our blogs and then our paper
https://extropic.ai/writing https://arxiv.org/abs/2510.23972
I was hoping the preprint would explain the mysterious ancient runes on the device chassis :(
i dig it.
people are so scared of losing market share because of art choice they make all of their products smooth dark grey rectangles with no features.
ugly.
at least this one has some sense of beauty, the courage to make a decision about what looks good to them and act on it. they'll probably have to change the heptagon shape because no way that becomes a standard
it costs so little to add artistic flair to a product, its really a shame fewer companies do
When I was a child, I was so enchanted by the look of the Cray supercomputers of old with their in-built furniture and great open arrays of status indicators. There is really something to making a machine show you the wonder of creation it unlocks
You might like Storm Summoner. https://kabaragoya.com
It looks super cool. I feel like I'm watching cyberpunk come to life with the way we're talking about technology these days, but this also looks straight out of the Neuromancer of my imagination.
The answer is that they're cosplaying sci-fi movies, in attempt to woo investors.
Why are you replying under every other comment here in this low effort, negative manner?
i think they're a hater
What, is a bit of whimsy illegal?
A product of dubious niche value that has this much effort put into window dressing is suspicious.
how much effort is it really to draw some doodles on the 3d model?
can you play doom on it, yet?
I don't really understand the purpose of hyping up a launch announcement and then not making any effort whatsoever to make the progress comprehensible to anyone without advanced expertise in the field.
That's the intention. Fill it up with enough jargon and gobbledegook that it looks impressive to investors, while hiding the fact that there's no real technology underneath.
You not comprehending a technology does not automatically make it vaporware.
>jargon and gobbledegook
>no real technology underneath
They're literally shipping real hardware. They also put out a paper + posted their code too.
Flippant insults will not cut it.
Nice try. It's smoke and mirrors. Tell me one thing it does better than a 20 year old CPU.
This hardware is an analog simulator for Gibbs sampling, which is an idealized physical process that describes random systems with large scale structure. The energy efficient gains come from the fact that it's analog. It may seem like jargon, but Gibbs sampling is an extremely well known concept with decades of work with connections to many areas of statistics, probability theory, and machine learning. The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap, it's very similar to EBM learning/sampling but with the advantage of being able to sample larger systems for the same energy.
> The algorithmic problem they need to solve is how to harness Gibbs sampling for large scale ML tasks, but arguably this isn't really a huge leap,
Is it?
The paper is pretty dense, but Figure 1 is Fashion-MNIST which is "28x28 grayscale images" - which does not seem very real-life for me. Can they work on a bigger data? I assume not yet, otherwise they'd put something more impressive for figure 1.
In the same way, it is totally unclear what kind of energy are they talking about, in the absolute terms - if you say "we've saved 0.1J on training jobs" this is simply not impressive enough. And how much overhead is it - Amdahl law is a thing, if you super-optimize the step that takes 1% of the time, the overall improvement would be negligible even if savings for that step are enormous.
I've written a few CS papers myself back in the day, and the general idea was to always put the best results at the front. So they are either bad communicators, or they don't highlight answers to my questions because they don't have many impressive things (yet?). Their website is nifty, so I suspect the latter.
More insults and a blanket refusal to engage with the material. Ok.
If you think comparing hardware performance is an insult, then you have some emotional issues or are a troll.
Ah, more insults. This will be my final reply to you.
I'll say it again. The hardware exists. The paper and code are there. If someone wants to insist that it's fake or whatever, they need to come up with something better than permutations of "u r stoopid" (your response to their paper: https://news.ycombinator.com/item?id=45753471). Just engage with the actual material. If there's a solid criticism, I'd like to hear it too.
I've noticed recently that HN is resembling slashdot more. I wonder what's causing it.
The fact that there's real hardware and a paper doesn't mean the product is actually worth anything. It's very possible to make something (especially some extremely simplified 'proof of concept' which is not actually useful at all) and massively oversell it. Looking at the paper, it looks like it may have some very niche applications but it's really not obvious that it would be enough to justify the investment needed to make it better than existing general purpose hardware, and the amount of effort that's been put into 'sizzle' aimed at investors makes it look disingenuous.
>The fact that there's real hardware and a paper doesn't mean the product is actually worth anything.
I said you can't dismiss someone's hardware + paper + code solely based on insults. That's what I said. That was my argument. Speaking of which:
>disingenuous
>sizzle
>oversell
>dubious niche value
>window dressing
>suspicious
For the life of me I can't understand how any of this is an appropriate response when the other guy is showing you math and circuits.
No, they're not showing just math and circuits, they're also showing a very splashy and snazzy front page which makes all kinds of vague, exciting sounding claims that aren't really backed up by the very boring (though sometimes useful) application of that math and circuits (neat how the design of those circuits may be).
If this was just the paper, I'd say 'cool area of research, dunno if it'll find application though'. I'm criticizing the business case and the messaging around it, not the implementation.
Two important questions I think illustrate my point:
1) The paper shows an FPGA implementation which has a 10x speedup compared to a CPU or GPU implementation. Extropic's first customer would have leapt up and started trying to use the FPGA version immediately. Has anyone done this?
2) The paper shows the projected real implementation being ~10x faster than the FPGA version. This is similar to the speedup going from an FPGA to an ASIC implementation of a digital circuit, which is a standard process which requires some notable up-front cost but much less than developing and debugging custom analog chips. Why not go this route, at least initially?
The fact that they show a comparison with an FPGA is a red flag, because large scale generative AI is their biggest weakness.
FPGAs are superior in every respect for models of up to a few megabytes in size and scale all the way down to zero. If they are going for generative AI, they wouldn't even have bothered with FPGAs, because only the highest end FPGAs with HBM are even viable and even then, they come with dedicated AI accelerators.
I don't know about your earlier point, but those questions are perfectly reasonable and a springboard for further discussion. Yes, that's where the rubber hits the road. That's the way to go.
If Extropic (or any similar competitor) can unlock these hypothetical gains, I'd like to see it sorted out asap.
"no really technology underneath" zzzzzzzzzzz
What's not comprehensible?
It's just miniaturized lava lamps.
A lava lamps that just produces randomness, ie for cryptology purposes, is different than the benefit here, which is to produce specific randomness at low energy-cost
This seems to be the page that describes the low level details of what the hardware aims to do. https://extropic.ai/writing/tsu-101-an-entirely-new-type-of-...
To me, the biggest limitation is that you’d need an entirely new stack to support a new paradigm. It doesn’t seem compatible with using existing pretrained models. There’s plenty of ways to have much more efficient paradigms of computation, but it’ll be a long while before any are mature enough to show substantial value.
I’ve been wondering how long it would take for someone to try probabilistic computing for AI workloads - the imprecision inherent in the workload makes it ideally suited for AI matrix math with a significant power reduction. My professor in university was researching this space and it seemed very interesting. I never thought it could supplant CPUs necessarily but certainly massive computer applications that don’t require precise math like 3D rendering (and now AI) always seemed like a natural fit.
I don't think that it does AI matrix math with significant power reduction but rather it just seems to provide rng? I may be wrong but I don't think what you are saying is true in my limited knowledge, maybe someone can tell what is the reality of it, whether it can do Ai matrix math with significant power reduction or not or if its even their goal right now as to me currently it feels like a lava lamp equivalent* thing as some other commenter said
The paper talks about some quite old-school AI techniques (the kind of thing I learned about in university a decade ago when it was already on its way out). It's not anything to do with matrix multiplications (well, anything do with computing them faster directly) but instead being able to sample from a complex distribution more efficiently by have dedicated circuits to simulate elements of that distribution in hardware. So it won't make your neural nets any faster.
I'm still waiting for my memristors.
there is also Normal Computing[0] that are trying different approaches to chips like that. Anyway these are very difficult problems and Extropic already abandoned some of their initial claims about superconductors to pivot to more classical CMOS circuits[1]
[0]: https://www.normalcomputing.com
[1]: https://www.zach.be/p/making-unconventional-computing-practi...
This gives me Devs vibe (2020 TV Series) - https://www.indiewire.com/awards/industry/devs-cinematograph...
Such an underrated TV show.
Yes, the billionaire driving a Subaru Forester was my favorite part
That's what they're trying to do, yeah. To give off a cool vibe I mean. To raise more money. There is nothing even remotely as cool in their real (or not) product. I was very excited when they started specifically because of their cool branding, but the vibe quickly wears off.
The cool thing about Silicon Valley is serious people try stuff that may seem wild and unlikely and in the off chance it works, entire humanity benefits. This looks like Atomic Semi, Joby Aviation, maybe even OpenAI in its early days.
The bad thing about Silicon Valley is charlatans abuse this openness and friendly spirit, and swindle investors of millions with pipe dreams and worthless technology. I think the second is inevitable as Silicon Valley becomes more famous, more high status without a strong gatekeeping mechanism which is also anathema to its open ethos. Unfortunately this company is firmly in the second category. A performative startup, “changing the world” to satisfy the neurosis of its founders who desperately want to be seen as someone taking risks to change the world. In reality it will change nothing, and go die into the dustbins of history. I hope he enjoys his 15 minutes of fame.
What makes you so sure that extropic is the second and not the first?
Fundamentally, gut feels by following the founder on Twitter. But if I had to explain, I don’t understand the point of speeding up or getting true RnG, even for diffusion models this is not a big bottleneck, so it sounds more like a buzzword than actual practical technology.
Having a TRNG is easy, you just reverse bias a zener diode or any number of other strategies that rely on physics for noise. Hype is a strategy they're clearly taking, but people in this thread are so dismissive (and I get why, extropic has been vague posting for years and makes it sound like vaporware) but what does everything think they're actually doing with the money? It's not a better dice roller...
What is it if not a better dice roller though? Isn't that what they are claiming it is? And also that this better dice rolling is very important (and I admittedly am not someone who can evaluate this)
Yes, I think they claim they are a far better dice roller in randomness and speed and that this is very important. The first might be true, but I don’t see why second is in any way true. These all need to be true for this company to make sense :
1. They build a chip that does random sampling far better than any GPU (is this even proven yet?)
2. They use a model architecture that utilizes this sampling advantage which means most of the computation must be concentrated at sampling. This might be true for energy based models or some future architecture we have no idea about. AFAIK, this is not even true for diffusion.
3. This model architecture must outcompete autoregressive models in economically useful tasks, whether language modeling or robotics etc, right now auto regressive transformers is still king across all tasks.
And then their chip will be bought by hyper scalers and their company will become successful. There is just so many if’s outside of them building their core technology that this whole project makes no sense. And you can say that this is true for all startups, I don’t think that’s the case, this is just ridiculous.
thats is the main reason i don't trust extropic.
I listened to the Hinton podcast few days ago, he mentioned (IIRC) that "analog" AIs are bad because the models can not be transfered/duplicated in a lossless way, like in .gguf format, every analog system is built differently you have to re-learn/re-train again somehow
Does TSUs have to same issue?
Is this the new term for analog VLSI?
Or if we call it analog is it too obvious what the problems are going to be?
In non deterministic computation, verificaition will be key challenge. Curious to see how companies address this.
How should we think about how much effective compute is being done with these devices compared to classical (GPU) computing? Obviously FLOPs doesn't make sense, so what does?
I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...
Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.
Hype aside, if you can get an answer to a computing problem with error bars in significantly less time, where precision just isn’t that important (such as LLMs) this could be a game changer.
Precision actually matters a decent amount in LLMs. Quantization is used strategically in places that’ll minimize performance degradation, and models are smart enough so some loss in performance still gives a good model. I’m skeptical how well this would turn out, but it’s probably always possible to remedy precision loss with a sufficiently larger model though.
LLMs are inherently probabilistic. Things like ReLU throw out a ton of data deliberately.
No that isn’t throwing out data. Activation functions perform a nonlinear transformation to increase the expressivity of a function. If you did two matrix multiplications without ReLU in between, your function contains less information than with a ReLU in between.
How are you calculating "less information"?
I think what they meant was:
Two linear transformations compose into a single linear transformation. If you have y = W2(W1*x) = (W2*W1)*x = W*x where W = W2*W1, you've just done one matrix multiply instead of two. The composition of linear functions is linear.
The ReLU breaks this because it's nonlinear: ReLU(W1*x) can't be rewritten as some W*x, so W2(ReLU(W1*x)) can't collapse either.
Without nonlinearities like ReLU, many layers of a neural network could be collapsed into a single matrix multiplication. This inherently limits the function approximation that it can do, because linear functions are not very good at approximating nonlinear functions. And there are many nonlinearities involved in modeling speech, video, etc.
Question for the experts in the field: why does this need to be a CPU and not a dongle you plug into a server and query?
Has anyone received the dev board? What did you do with it? Curious what this can do.
This looks really amazing if not unbelieveable to the point where it is almost too good to be real.
I have not seen benchmarks on Extropic's new computing hardware yet but need to know from experts who are in the field of AI infrastructure at the semiconductor level if this is legit.
I'm 75% believing that this is real but have a 25% skepticisim and will reserve judgement until others have tried the hardware.
So my only question for the remaining 25%:
Is this a scam?
Too good to be true, incomprehensible jargon to go along...
I mean it sure looks like a scam.
I really like the design of it though.
I doubt it’s a scam. Beff might be on to something or completely delusional, but not actively scamming.
The best conmen have an abundance confidence in themselves.
This one just released a prototype platform, "XTR-0", so if it's a fraud the jig will shortly be up.
https://extropic.ai/writing/inside-x0-and-xtr-0
I think it's more a concern that the hardware isn't useful in the real world, rather than that the hardware doesn't meet the description they provide of it.
Looks like an artifact from Assassin's Creed or Halo.
i've followed them for a while and as just a general technologist and not a scientist, i have a probably wrong idea of what they do, but perhaps correcting it will let others write about it more accurately.
my handwavy analogy interpretation was they were in-effect building an analog computer for AI model training, using some ideas that originated in quantum computing. their insight is that since model training is itself probabilistic, you don't need discrete binary computation to do it, you just need something that implements the sigmoid function for training a NN.
they had some physics to show they could cause a bunch of atoms to polarize (conceptually) instantaneously using the thermodynamic properties of a material, and the result would be mostly deterministic over large samples. the result is what they are calling a "probabilistic bit" or pbit, which is an inferred state over a probability distribution, and where the inference is incorrect, they just "get it in post," because the speed of the training data through a network of these pbits is so much more efficient that it's faster to just augment and correct the result in the model afterwards than to use classical clock cycles to directly compute it.
It's finally here! Extropic has been working on this since 2022. I'm really excited to see how this performs in the real world.
Nice!
This is "quantum" computing, btw.
Actually it's not. Here's some stuff to read to get a clearer picture! https://extropic.ai/writing
It strictly is not, as no quantum phenomena is being measured (hence why I used the quotes); but if all goes well w/ extropic you'll most likely end up doing quantum again.
Usually there's a negative correlation between the fanciness of a startup webpage and the actual value/product they'll deliver.
This gives "hype" vibes.
Interesting you say that, I had an instinctual reaction in that vein as well. I chalked it up to bias since I couldn’t think of any concrete examples. Something about the webpage being so nice made me think they’ve spent a lot of time on it (relative to their product?) Admittedly I’m nowhere close to even trying to understand their paper, but I’m interested in seeing what others think about it
I've seen it as well. One thing that's universally true about potential competitor startups in the field I work in is that the ones who don't actually have anything concrete to show have way nicer websites than ours (some have significantly more funding and still nothing to show).
I have a passing familiarity with the areas they talk about in the paper, and it feels... dubious. Mainly because of the dedicated accelerator problem. Even dedicated neural net accelerators are having difficulty gaining traction against general purpose compute units in a market that is ludicrously hot for neural net processing, and this is talking about accelerating Monte-Carlo processes which are pretty damn niche in application nowadays (especially in situations where you're compute-limited). So even if they succeed in speeding up that application, it's hard to see how worthwhile it would be. And it's not obvious from the publicly available information whether they're close to even beating the FPGA emulation of the concept which was used in the paper.
I'm more impressed that my laptop fans came on when I loaded the page.
It's the one attached to my TV that just runs movies/YT - I don't recall the last time I heard the fans.
They did say thermodynamic computing.
Same on a serious dev machine. That page just pegs a core at max, it's sort of impressive.
I ran it through googles pagespeed insights.
It scored 34 for mobile and completely timed out for desktop with a time limit exceeded warning.
It’s hilariously bad.
[dead]
[dead]