A few days ago, I published a positive article on Advanced Micro Devices (AMD) titled “AMD: An Epyc Wild Card.” In it, I explained why AMD looked set to take a meaningful (~10%) share of the server market – a high value market at that.
Yet, there are also things that are going horribly wrong at AMD. So how can that be, how can there be things going so right and things going so wrong at the same time? There seems to be a simple reason to explain it.
On the side that’s going right, CPUs, you have (had) a factor that the other side, GPUs, does not have. That factor is Jim Keller. Jim Keller was there at AMD when AMD last became competitive with Intel (NASDAQ:INTC) back in the early Opteron days. Jim Keller was at Apple (NASDAQ:AAPL) right before Apple had a breakthrough with its own custom-designed CPUs. And now Jim Keller was at AMD when AMD designed the new Zen cores at the base of its current resurgence. Jim Keller is gone now (to Tesla (NASDAQ:TSLA)), but let’s leave that for later.
AMD, however, isn’t just CPUs. It also has a discrete GPU segment, which it got from acquiring ATI back in 2006. And there, as I will show, things are going horribly wrong. How can this be going wrong when CPUs are going right? Well, apparently, that would be because Jim Keller doesn’t do GPUs.
Here’s Why GPUs Are Going Horribly Wrong
It’s very easy to see that AMD has had a huge problem when it comes to GPUs. Since May 2016, when Nvidia (NVDA) launched the GeForce 10 series, AMD has had no competitive GPU at the top end of the market. AMD has had nothing to compete with either the Nvidia GTX 1070 or the GTX 1080, based on Nvidia’s Pascal architecture.
Only now, nearly 1.5 years later, is AMD getting ready to launch its new Vega architecture, namely the Vega RX for the consumer market. Early indications, though, show that both the Vega Frontier Edition and the Vega RX struggle to even match the Nvidia GTX 1080 in performance terms. That is, cards based on AMD’s new architecture just match Nvidia’s top end cards from 1.5 years ago
It gets worse, though. Much worse. Consider the following:
- The new Vega cards are said to be consuming a lot more power than the GTX 1080. The GTX 1080 has a 180W TDP, the new AMD cards have 300W TDPs. This will have cost (of ownership) implications, as well as cooling implications and overclocking headroom implications. It makes the new Vega cards less desirable, arguably even at a lower pricing than the GTX 1080.
- And what’s ugliest is that the Vega Frontier Edition and the Vega RX are based on a monster 484mm2 die size and carry HBM2 memory. Why is this ugly? Because such massive and costly specs are used to perform at the GTX 1080 level, which uses a much smaller (and thus less costly) 314mm2 die and cheaper GDDR5X memory. In other words, these Vega cards are at a massive cost disadvantage from the get go and then they’ll also sell in much lower quantities, adding a massive economy of scale disadvantage to that.
Some will quickly say that these cards will shine in the data room, where GPUs are increasingly being used for AI purposes. In other words, that these cards will bring the fight to Nvidia for AI purposes.
Unfortunately, there’s no hope in that either, and for a very simple reason:
- Nvidia is now bringing out its Volta architecture. The Volta architecture includes tensor cores specifically for AI jobs, which massively accelerate neural network training and inference. The Vega architecture has no such specific-purpose cores and thus will be unable to compete when it comes to AI applications.
There really is no saving grace, either in the consumer side of things or the data room side of things. Vega is a stillborn, unlike Zen.
I wrote positively on AMD’s prospects in the server market, and I think ultimately, AMD will perform decently there. However, there’s an emerging, technical, issue that AMD needs to address quickly, lest Zen-based CPUs get a bad reputation on reliability – something that is a complete “no no” when it comes to servers.
What is this issue I am writing about? It arises when Ryzen chips are used to compile in Linux while under heavy loads. Many users are reporting that under such scenarios GCC crashes often with a segfault.
Ordinarily, such an issue would quickly be addressed. However, in this case, the issue has been running since May with no definitive resolution. There are stop-gap measures, which involve crippling the CPU, but obviously, those aren’t acceptable.
Things are going very right for AMD when it comes to CPUs, be it in the consumer or server side. However, things are going very wrong for AMD when it comes to GPUs, be it in the consumer side or server side.
Arguably, the CPU improvement will be more visible in the P&L than the GPU disaster. This is so because the CPU side will improve from a very low level and conquer a slice of a very large market, whereas the GPU side will simply stay at a very low level.
A realistic view of the company shouldn’t ignore these developments. Likewise, a realistic view of the company shouldn’t ignore that the possible architect of AMD’s recovery is no longer with the company. This doesn’t mean that AMD can’t keep the momentum and be competitive for 3-4 more years just on Keller’s prior work. The thing is, AMD doesn’t just trade according to what it might do the next 3-4 years.
Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.
I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.