The Age Of Nvidia

Posted on June 10, 2017 by TheSaint in DirectXFiles, GPU Programming, Graphics

Well it appears that the GPU era of computing is finally here!  Intel is in deep trouble.  For those of you who haven’t read my blog extensively over the years, I started the original DirectX team at Microsoft way back in 1994 and created the Direct3D API with the other early DirectX co-creators (Craig Eisler and Eric Engstrom) and promoted its adoption to the video game industry and to graphics chip makers.    There are lots of stories about that here on this blog but the one that is relevant to this post was this blog article I wrote back in 2013.

The Nvidia Story

” I think that Nvidia’s vision for the future of gaming is the right one and I’m very excited to be alive in an era when I can work with such amazing computing power.  I feel like I lived to see the era when I could walk on the bridge of the Enterprise and play with warp cores…. literally… because that’s what Nvidia calls the smallest unit of parallel threads you can run on a GPU.”

For those of you who follow the stock market you may have noticed that Nvidia’s stock price has soared recently after many years of slow creeping progress.  I’m going to observe that this is sudden surge in their share price heralds a revolutionary shift in computing that represents the culmination of many years of progress in GPGPU development.  Up until now Intel has held a dominant monopoly over Enterprise computing for many years, successfully fending off all challengers to their supremacy in the Enterprise computing space.  This dominance is ending this year and the market sees it coming.  To understand what is going on and why it is happening now, I’m going to take a jump back in time to my earlier years at Microsoft.

In the 1990’s Bill Gates coined the term “Cooperatition” to describe Microsoft’s strained competitive partnerships with other leading industry tech giants of the era.  The term came up a lot in reference to Intel.  While Microsoft and Intel’s fates and success had become deeply intertwined the two companies constantly struggled for dominance over the other.  Both Microsoft and Intel had teams of people “specialized” in trying to achieve a position of dominance over the other.  At the time Microsoft Executive Paul Maritz (Then Executive VP of Platforms) was very concerned that Intel might attempt to “virtualize” Windows thus enabling many competing operating systems to enter the market and co-exist on the PC desktop with Windows.  *Paul Maritz later went on to become CEO of VMWARE… just saying…  Indeed Intel was heavily invested in just such an effort.  One of their strategies was to attempt to emulate in software all conventional “special” hardware functionality that PC OEM’s typically included with a PC such as video cards, modems, sound cards, networking, etc.  By moving all external computing onto the Intel processor, Intel could eliminate the sales and growth of all possible alternative computing platforms that might grow to become a computing threat to the value of an Intel CPU.  It was specifically Intel’s announcement of 3DR in early 1994 that spurred Microsoft to create DirectX.

I worked for the team at Microsoft that was responsible for “positioning” Microsoft strategically against competitive threats in the market called DRG (Developer Relations Group).  Intel had requested that Microsoft send a “representative” to speak at their launch event for 3DR.  As DRG’s resident graphics and 3D expert I was sent on Microsoft’s behalf with the specific mission of evaluating the threat that Intel’s new initiative represented to Microsoft and formulating an effective counter-strategy.  My assessment was that Intel was indeed attempting to virtualize Windows by software emulating all possible competitive external processing.  I wrote a proposal called “Taking Fun Seriously” that suggested that the way to prevent Intel from making Windows “dispensable” was to create a competitive consumer marketplace for new hardware capabilities.  The idea was to create a new suite of Windows drivers that enabled massive competition in the hardware market to enable new audio, input, video, networking and other media capabilities that would all depend on proprietary Windows drivers to work across a new market we would create for PC based video games.  Intel would not be able to keep up with the free market competition we created among consumer hardware companies and therefore never be able to create a CPU that could effectively virtualize all of the functionality consumers demanded.  Thus DirectX was born.

There are many stories on this blog about the events that surrounded the creation of DirectX but in short our “evil scheme” was wildly successful.  Microsoft realized that the way to dominate the consumer market and keep Intel at bay was by focusing on video games and dozens of 3D video chip makers were born.  Twenty some years later Nvidia is among the handful of survivors along with ATI, since acquired by AMD, that came to dominate first the consumer graphics market and increasingly the enterprise computing market.

This brings us to today, 2017, the year GPU’s finally begin to permanently displace the venerated x86 based CPU.  Why now and why GPU’s?  The secret to the x86 hegemony has been Windows and backwards compatibility of the x86 instruction set all the way to the 1970’s.  Intel has been able to maintain and grow it’s enterprise Monopoly because the cost of porting applications to any other CPU instruction set with no market share is prohibitive.   The phenomenal body of functionality enabled by the Windows OS and tied to the x86 platform has further entrenched Intel’s market position.  The beginning of the end for Intel began when Microsoft AND Intel both failed to make the leap to also dominating the emerging mobile computing market.  For the first time in decades a major crack in the x86 CPU market opened and ARM based CPU’s filled it and new alternative OS’s to Windows from Apple and Google were able to capture the newly opened market.  Why did Microsoft and Intel fail to make the leap?  There are a lot of interesting reasons but for the purpose of this article the one I would like to highlight is the baggage of X86 backwards compatibility.  For the first time power efficiency became more important to the success of a CPU than speed.  All of the transistors and all of the millions of lines of x86 code that Intel and Microsoft had invested in the PC became an obstacle to power efficiency.  The most important aspect of Microsoft and Intel’s market hegemony became a liability over night.

Intel’s need to constantly achieve increased performance while maintaining backwards compatibility forced Intel to waste more and more power hungry transistors to achieve diminishing performance returns in each new generation of x86 CPU.  Backwards compatibility also severely impeded Intel’s ability to make their chips more parallel.  While the first GPU’s were highly parallel out of the gate in the 1990’s the first DUAL core CPU was not released by Intel until 2005.  Even today in 2017, Intel’s most powerful CPU’s only manage 24 processing cores compared to the thousands found in most modern video cards.  GPU’s which were intrinsically parallel, did not have any legacy compatibility baggage to carry and, enabled by architecture independent API’s like Direct3D and OpenGL, were free to innovate and increase their parallelism without compromising compatibility or transistor efficiency.  By 2005 GPU’s had even become GENERAL PURPOSE computing platforms supporting heterogeneous generalized parallel computing.  (Heterogeneous in this case is a reference to the fact that an AMD and an NVIDIA chip can run the same compiled programs despite having entirely different low level architectures and instruction sets.)   While Intel Chips were achieving diminishing performance returns, GPU’s were doubling in performance while halving their power requirements every 12 months!  Extreme parallelism enabled very efficient transistor utilization ensuring that each transistor added to a GPU could be effectively deployed for speed, while a growing percentage of new x86 transistors were going to waste.

Although GPU’s were increasingly making inroads into enterprise super computing, media production and VDI solutions, the major market turning point came when Google began using GPU’s effectively to train neural networks to do really useful things.  The market realized that artificial intelligence would be the future of big-data processing and would open vast new automation markets.  GPU’s were ideally suited to run neural networking applications.  Until this point Intel had successfully relied on two approaches to suppress the growing influence of GPU’s in enterprise computing.

  1.  Intel keeps the PCIe bus slow and limits the number of IO lanes that an Intel CPU supports thus ensuring that GPU’s are always dependent on an Intel CPU to serve their workload and remain separated from many valuable real-time and HPC applications by latency and PCIe bandwidth constraints.  As long as their CPU’s could throttle application access to GPU performance, Nvidia remained safely marooned on the other side of the PCIe bus from many practical enterprise work loads.
  2. Provide cheap but minimally functional GPU’s on consumer CPU’s to isolate Nvidia and AMD to the premium gaming market and out of mainstream adoption.

The growing threat from Nvidia and Intel’s own failed attempts to create x86 compatible super-computing accelerators caused Intel to try another new tactic.  They acquired Alterra and plan to include programmable FPGA’s with next generation Intel CPU’s.  This is a very clever way of enabling Intel CPU’s to support dramatically better IO capabilities than their PCIe constrained counterparts while preventing GPU’s from benefiting from those enhancements.   Backing FPGA’s also gave Intel a way to move towards supporting greater parallelism on their chips without benefiting the growing GPU based application market.  It also enabled enterprise hardware vendors to create highly specialized custom hardware solutions that were still x86 dependent.  The move was tactically brilliant on Intel’s part because it acted to exclude GPU’s from penetrating the enterprise market on several axis’ at once.  Brilliant but probably doomed to fail.

Now in five easy news clips, the reason I believe that the x86 party ends in 2017…

  1. SoftBank raises 93B from companies with a common interest in displacing Intel
  2. SoftBank buys ARM
  3. SoftBank “buys” Nvidia
  4. Nvidia launches an ARM/Hybrid mobile chip… the X2 with…
  5. …GPU accelerated ARM cores on it

Why is this sequence of events important?  Because this is the year that the first generation of self-hosting GPU’s are widely available on the market, able to run their own OS with no PCIe obstacles.  NVIDIA does not need an x86 CPU anymore.  ARM has a vast body of consumer and enterprise OS’s and applications ported to it.  All enterprise and cloud hardware makers adopt ARM chips as controllers for a vast array of their current market solutions.  ARM chips are already integrated with all leading FPGA solutions.  ARM chips are low power at the cost of performance but GPU’s are extremely fast and also power efficient so the GPU can provide the processing muscle while the ARM cores can handle the mundane IO and UI management tasks that don’t demand a lot of compute power.  The growing body of Big-data, HPC, and especially machine learning applications don’t need Windows and don’t perform on X86.  So 2017 is the year Nvidia slips its leash and breaks free to become a genuinely viable competitive alternative to x86 based enterprise computing in valuable new markets that are unsuited to x86 based solutions.

If an ARM processor isn’t beefy enough for your big-data computing needs, IBM has also partnered with Nvidia to produce a generation of monster number crunching CPU’s with the Power9 sporting 160 PCIe lanes.,32661.html

AMD has also launched their new Ryzen CPU’s and unlike Intel, AMD has no strategic interest in choking off PCIe performance.  Their consumer grade chips sport 64 PCIe 3.0 lanes and their pro chips will support 128.  AMD is also launching a new HIP cross compiler that makes CUDA applications designed for Nvidia CPU’s compatible with AMD GPU’s.  Despite being competitors, both companies will benefit from flanking Intel in the enterprise market with alternative approaches to GPU based computing.

All of this means that GPU based solutions will sweep enterprise computing at an accelerating rate in coming years with the world of desktop UI driven computing increasingly relegated to virtualization in the cloud or  running on mobile ARM processors as even Microsoft has announced Windows support for ARM.

Put it all together and I predict that within a few years all we will hear about is the battle between GPU’s and FPGA’s for enterprise computing supremacy as the CPU era fades into slow decline.

*…and Quantum computing will prove to be irrelevant the entire time…






  1. > Thus DirectX was born.

    If Miscrosoft did not create DirectX and instead just focused on supporting OpenGL and other hardware APIs – could that be enough to “keep Intel at bay”?

    • Remember that in the DirectX era Microsoft ALSO controlled OpenGL although most folks don’t realize that. There are several articles on this blog about the early days of OpenGL but here’s one of the more relevant ones.

      OpenGL was NOT a hardware driver model, so OpenGL did NOTHING to create a competitive market for 3D chips. Direct3D WAS Microsoft’s driver model for 3D hardware including OpenGL. There was a lot of confused screaming about that at the time because people didn’t understand the idea that OpenGL was not a general driver architecture for 3D just because SGI could hardware accelerate their implementation of it. SGI was also dying in that era, there was little reason to believe that OpenGL had a life beyond SGI and CAD applications and the OpenGL team at Microsoft shared that attitude. They could not be persuaded to focus on OpenGL as a gaming platform until Direct3D forced the issue. Ironically the success of Direct3D AND OpenGL in gaming was the best possible outcome because together they prevented Microsoft from trying to impose it’s own Talisman hardware standard on all PC 3D which would have been a disaster for the entire industry. (Search Talisman on this blog).

      • @Alex,

        > OpenGL was NOT a hardware driver model

        Wikipedia thinks that OpenGL is an API for drivers:
        OpenGL standardized access to hardware, pushed the development responsibility of hardware interface programs (sometimes called device drivers) to hardware manufacturers, and delegated windowing functions to the underlying operating system.

        Microsoft released Direct3D in 1995, which eventually became the main competitor of OpenGL.

        So if Microsoft did not create Direct3D, but Windows team did the superior job at supporting OpenGL API – why wouldn’t it help Windows to build a moat around its dominance?

        > Search Talisman on this blog

        • yeah, sorry your random web source is far less informed on the subject than the guy who was there and made the decisions. OpenGL was definitely not a driver model. SGI’s proprietary hardware at the time only supported polygon rasterization. Direct3D was the first 3D driver model that enabled competing hardware vendors to accelerate lighting, geometry transformation, texture mapping and Zbuffering. We standardized hardware texture and geometry formats to enable games to use on format that worked with multiple HW solutions. More importantly it was actually integrated with Windows allowing the Windows UI, mouse interaction and 2D rendering functions to interact with 3D scenes and it enabled the 2D UI to share hardware memory with the 3D page buffers. As a consequence of Direct3D’s success the Windows OpenGL team got competitive with Direct3D and started innovating on OpenGL. They had refused to work with us to support gaming hardware until we released our own driver model for it. It was about that time that MSFT decided to hand it ALL over to the Talisman team WITH SGI’s SUPPORT.

          Despite this being another idiotically incorrect Wiki article, it does describe the history of Fahrenheit, Microsoft’s horrible Talisman API. Direct3D was actually first launched with DirectX 2.0 and then massively upgraded with DirectX 3.0. Direct API’s were version numbered with the API family releases. We didn’t give each API it’s own version number.

          • @Alex,

            > As a consequence of Direct3D’s success the Windows OpenGL team got competitive with Direct3D and started innovating on OpenGL.

            Do you mean that without Microsoft Direct3D effort, 3D drivers ecosystem would have stagnated (which, in turn, would risk Microsoft Windows domination)?

          • Microsoft was secretly negotiating with SGI at the time for SGI to become a Windows OEM. One of SGI’s requirements was that WindowsNT be able to run OpenGL. The ONLY reason Microsoft had an OpenGL team initially was to satisfy SGI’s requirements. Anything else we requested they declined because once SGI was on board that was the extent of Microsoft’s interest in OpenGL. At the time nobody knew SGI was going Windows so they didn’t know that OpenGL control had basically been handed to Microsoft. There was no “3D drivers ecosystem” there was just SGI proprietary rasterizer HW. So yes, the non-existent ecosystem for OpenGL drivers would have continued to not exist until Direct3D was created, creating an ecosystem for Direct3D drivers, creating demand for OpenGL support for HW that wasn’t Windows dependent, which forced the Windows OpenGL team to start thinking about 3D drivers beyond what SGI needed.

  2. Another interesting piece to add would be that Microsoft is bringing an x86 emulation layer to Windows on ARM. Legal issues notwithstanding, surely those “mundane” CPU tasks could handle a performance hit. Over time, we all might be freed from x86, after all.

  3. Would you mind changing the colors and/or shapes in the AMD v NVIDIA graph?
    It’s currently hard to read if you have weak red-green colour vision (Deuteranomaly), which affects 6% of males and thus works out to about 220 million individuals worldwide…

  4. If you don’t mind me asking Alex. How does Richard Huddy fit into the Direct X story ? As his name always seems to be linked to the API

    • Never heard of him. He wasn’t involved in the early days but a lot of great people have been involved with DirectX over the many generations it has been in the market.

      • Thanks for the reply. I only asked because he is sometimes referred to the Godfather of DirectX like in this article for example :

        From searching around he was part of a company which got taken over by MS and their tech incorporated into DirectX.
        The company was Rendermorphics :

        • I’m embarrassed to say that I don’t remember him. Eric Engstrom and I led the acquisition of Rendermorphics. I did the due diligence and identified them as the right acquisition candidate. Other contenders that the time were Epic, Criterion, and Argonaut. I chose Rendermorphics because they had the largest team of 3D experts. Richard may very well have been among them but it’s been a long time. The founders Servan and Kate came to work for us in the US. Servan worked for Eric and Kate worked for me as head of 3D evangelism. Not all of the original Rendermorphics team relocated to the US. I don’t recall any personal interactions with Richard in that era but I’ll check with Eric and see what he remembers. Servan was certainly the leading influence over the early generations of Direct3D. Stand by.

          • Okay my old DirectX colleagues, Colin McCartney, now responsible for Windows support at Nvidia, and Phil Taylor, former D3D evangelist, confirm that Richard joined 3DLabs from Criterion. Criterion is the same company Colin came from. Colin recalls that Richard joined shortly after the Rendermorphics acquisition and worked at the UK office on the D3D driver and retained mode API performance optimizations. I never met him but he was indeed a member of the early D3D team in the UK office.

  5. I’m not too sure that I buy the “GPU panacea” narrative; this is probably due to my background which is in the development of scientific and engineering software. Firstly there is a lot of implicit parallelism in the x86 architecture as exposed through the SSE and AVX instruction sets. Add to this exploitation of few cores and you can get huge leaps in performance. Secondly, there have been some serious issues with numerical instabilities associated with floating point operations on GPUs; according to a mathematics professor I spoke to they had built a huge GPU-based “cluster” to aid with some physical models that lended itself to massively parallel design, but they found that the parts involving iterative solvers often diverged because of numerical instabilities. He explained that there is a new generation of CPUs that will have many, many cores and they will move onto that once it becomes economic enough.

    • Sounds like a compiler issue, the GPU uses a mode called “FASTMATH” by default because it’s designed for video games. You have to toggle a specific compiler setting to get IEEE floating point. It’s a common mistake. I’ve also run into scientists trying to use the X1 for low power supercomuting who don’t know that they need to disable power saving mode on the CPU to get the best performance. The SSE and AVX instructions perform like nothing compared to a modern GPU. The fastest best optimized SSE and AVX code is 1/10th – 1/20th the speed of a modern GPU. The CPU will always be 1/10th of the speed of the GPU for most compute intensive applications simply because the CPU’s memory bus is 1/10th the speed of the GPU’s. CPU’s waste a vast volume of their transistors on branch prediction which is nice for quickly writing mildly parallel code. The GPU dispenses with most branch prediction allowing it to dedicate ALL of it’s cores to useful computations at the expense of the software developer having to write GPU optimized code. I have 60 engineers who just specialize in helping scientists port their old C code to the GPU. We just finished a coral reef simulation project that got 500X faster. It went from taking 8 days to run on a massive X86 supercomputer to 22 minutes on a laptop GPU. It’s generally far more work to extract the best performance out of a combination of SSE and AVX instructions than it is to optimize for a GPU. The coral reef sim is a massive Lagrangian equation solver, perfect floating point match with the X86 version.

      Intel can’t outperform the GPU architecture for compute performance in the future by adding more cores without giving up X86 compatibility. The X86 instruction set was never designed to scale efficiently to parallelism.

      • Yeah, I think it is just a background issue, I don’t really have the expertise to be honest. I don’t exactly fear about my future job prospects but it’s getting very hard to keep up. I guess the GPU-development has been in the offing for quite a while but when you’ve got to develop and maintain stuff that’s already out there you have little time to experiment. It certainly seems like things are shifting.

      • Fascinating article with great historical perspective! Thanks for sharing!
        On the topic of moving existing simulation C code to the GPU, what language do you use to program for the GPU? What libraries? What development tools? Do you have links to share for those of us interested in learning how to develop code optimized for GPUs? Thanks!


Leave a Reply


Get every new post delivered to your Inbox

Join other followers:

%d bloggers like this: