Apple’s Metal API, first impressions

Just to be absolutely clear with everybody, this impression is based simply on a review of Apple’s publicly available Metal API doc found here;

https://developer.apple.com/library/prerelease/ios/documentation/Miscellaneous/Conceptual/MTLProgGuide/Introduction/Introduction.html

It’s very difficult to accurately or fairly judge a new technology without really getting your hands on an SDK and digging in to try to use it.  That said, I confess that I’m a little excited about what I see here.  It’s easy to see things you WANT to see in an API that don’t necessarily work when you try to use it but here’s what I was looking for… the big “obstacle” to modern 3D graphics has been that the traditional OpenGL/DirectX like graphics pipeline was very linear and highly moderated by the CPU.  In the era before GPU’s became incredibly fast, parallel and general purpose this was a very efficient architecture because the extreme linear pipelining of the graphics API gave the GPU hardware every opportunity to optimize that pipeline for maximum throughput.  Times however have changed, CPU’s are relatively slow and buses to the GPU are relatively slow and getting relatively “slower” as GPU parallelism accelerates.  This makes the host operating system, CPU and bus a growing performance bottleneck.  Furthermore GPU’s have gotten so fast and versatile that they are increasingly capable of running the whole game… graphics, physics, AI, everything and it has become increasingly practical to think about new kinds of game engines that treat graphics, physics, AI and sound as all part of the same world model instead of separate components that have to be artificially stitched together by the game developer.  The desire to create “grand unified” game engines is hard to resist but the traditional graphics pipeline has largely effectively isolated graphics rendering from the other aspects of game design.

Here is a crude graph of what the modern DirectX/OpenGL graphics pipeline looks like; The_OpenGL_-_DirectX_graphics_pipelineNotice that it’s not exactly a linear pipeline as it was in its early days.  The addition of “valves” and feedback loops to the pipeline are modern adaptations to the growing need to interject or mediate intermediate stages in the pipeline to achieve particular graphics effects.  For example most modern games use a lighting model called volumetric lighting that involves rendering a given scene repeatedly from the camera point of view of each light in the scene in order to correctly light and shadow the complete scene.  Thus each scene may make multiple passes through the pipeline to complete a single image frame in a game.  Making lights dynamic (allowing them to move around and change properties) has largely been sacrificed in many games in favour of static realism.  Because the 3D pipeline is so specialized to graphics, game physics and AI are generally computed elsewhere (mostly on the CPU) and incorporated into the resulting scene. Thus a modern 3D pipeline isn’t really linear anymore.  It has joints, loops and interception points that can be said to mostly be artefacts of earlier eras in GPU architecture, 3D authoring tools and engine design.

In an era when the GPU is almost perfectly capable of running the entire game and is inherently massively parallel, it would be more ideal if a game engine could dynamically control every stage of the pipeline in parallel and mix and match graphics compute functionality with the more general shader style of programming.  For example for a 3D scene with 3 light sources that each need to be rendered to generate light volumes, why not be able to run all three renders concurrently and then composite them in a final stage INSTEAD of the CPU calling the render loop 4 times sequentially to achieve the same result? Mixing GPU programming with graphics programming has generally been difficult to date.  A developer could either rely on the CPU bound graphics API like OpenGL to inject programmable shader code into the graphics pipeline in a highly constrained way OR use a more general purpose GPU programming language like CUDA and use graphics interoperability functions to crudely mix and match OpenGL calls with CUDA or OpenCL powered computations.  None of it has been a very elegant solution, which is what has lead me to previously suggest that our existing graphics API’s have reached obsolescence.  They’re hindering progress, not enabling it.

On first inspection of Apples Metal API documentation I see several familiar CUDA like references to general purpose compute kernels and programming semantics very similar to CUDA or OpenCL sitting directly along side familiar OpenGL/DirectX like semantics for programmable vertex and pixel shaders.  At first perusal it appears that Apple may actually have flattened the graphics pipeline… exactly as desired such that it becomes possible to create many concurrent pipelines which may or may not be graphic in nature but can all be executed simultaneously.

In this illustration for Apple’s Metal API we see what appears to be multiple “custom” constructed graphics pipelines with compute elements getting assembled to all execute concurrently in separate parallel threads.  Some of the pipelines are 3D, some are 2D image blits and some are compute packages that may or may not execute as part of a graphics package.  In my previous example it would appear that I could indeed render a given scene from three lighting positions concurrently using this approach.  This then appears to be a FORWARD step in the direction of a unified graphics/compute architecture.  I hate to admit it, but I want to get excited about this… Gimme some slack people, I’m the Direct3D guy, this kills me to admit…

 

 

So here we see something in the Metal doc that appears to be analogous to a CUDA __global__ compute function.

kernel void filter_main(
texture2d<float,access::read> inputImage [[ texture(0) ]],
texture2d<float,access::write> outputImage [[ texture(1) ]],
uint2 gid [[ global_id ]],
texture2d<float,access::sample> table [[ texture(2) ]],
constant Parameters* params [[ buffer(0) ]]
)
{
float2 p0 = static_cast(gid);
float3x3 transform = params->transform;
float4 dims = params->dims;
float4 v0 = read_and_transform(inputImage, p0, transform);
float4 v1 = filter_table(v0,table, dims);
outputImage.write(v1,gid);
}

In this example the code resembles a traditional OpenGL or DirectX shader function or a general purpose CUDA __global__ kernel function.  The keyword “kernel” marks this as a GPU function.  The double brace syntax appears to be Metals syntax for indicating buffers allocated in GPU memory and the function body itself looks like modern C++ 0×11 code which is clearly interoperating with graphics functionality.  Apple’s Metal documentation says that the Metal compiler will be C++ 0×11 compatible which will mean that more advanced C++ template functions may also be arriving on the GPU very soon.  *CUDA doesn’t support this fully yet but then CUDA is a live API and this is just a document at the moment so there is a good chance that Nvidia and Apple will arrive at C++ 0×11 compliant GPU compilers in the same near time frame.

I confess that on first read of Apple’s document it looks pretty complex but then graphics has never been simple and trying to mix general purpose GPU programming with a deconstructed classical 3D pipeline is an ambitious undertaking.  On examination it seems to be the right idea, but it’s impossible for me to judge how well it works until I can get my hands on it.  I suspect that this new style of parallel graphics programming will present a learning curve for most 3D programmers.  The big observation I would take away from this is that unlike the media characterization of this being a Mantel like solution for accelerating DRAW calls, I get the impression that the deconstructed graphics pipeline is actually the real news event here.   I know that if I had access to a deconstructed 3D pipeline from CUDA I would be thrilled, so if this is Apple’s vision for how that challenge can be tackled, I’m eager to see more of it.

*Update:  After I posted this it occurred to me that I might be guilty of a little observer bias.  I’ve been working with CUDA so long that anytime I see a reference to something like a GPU kernel I just assume that it has CUDA like properties such as the ability for a GPU kernel to launch other GPU kernel functions.  I don’t recall noticing that Metal kernels supported this… if they don’t then it means that the API is still CPU bound, just more parallel than before, that would be an improvement but not the exciting leap to pure GPU computing I would hope for.  I’ll have to look more closely for that feature to see if it’s really missing…

24 Comments

  1. The [[brackets]] are C++11 attribute syntax. Compute shaders aren’t really new; Metal’s aren’t really much greater or less than the capabilities of DX11/GL4.3.

    There is some nice stuff and novelty in Metal, but I wouldn’t class it as a revelation. Largely it’s what developers have been pushing after for a while. There are areas where it’s doing what everybody else should be doing (and many people have been yelling at Khronos to do to OpenGL), and there are areas where it’s beaten by the competition (if you can finagle your engine to the OpenGL 4 AZDO (“Almost Zero Driver Overhead”) techniques, they’re unbeatable. They’re undoubtedly what you should be doing on modern desktop hardware, but they’re tricky, and not universally applicable. Still, one must credit the sheer efficiency they exhibit – up to 4x the draws of the next best techniques).

    Basically, what should happen is that “OpenGL 5″ is another backwards incompatible release like GL3 was, which dikes out all the APIs which let you do silly things like make invalid buffers and, all the deprecated ways of doing things (like explicit texture bindings – you pass a sampler direct to your shader now), migrates EXT_direct_state_access to a core extension, and steals some of the threading stuff from Metal and co (Presumably Mantle has threading also? AMD, where are your docs?).

    Of course, last time Khronos diked out the old legacy cruft people shouted at them. On the other hand, this time people are shouting at them to do it, and maybe Mantle and Metal will light a fire under them.

    Either they’ll realize they’re at a crossroads, or they will die. However, NVIDIA won’t let that happen (they would be ceding the entire – growing – non-Windows market to AMD), AMD shouldn’t let that happen (their OpenGL team is quite vocal) and Intel probably won’t let that happen (They depend upon the “filter down” from high end hardware to be good enough).

    And, really, who do you think are the loudest voices on the Khronos board?

    • http://www.codesynthesis.com/~boris/blog/2012/04/18/cxx11-generalized-attributes/
      That is really interesting, I have not seen those in use before. It looked to me as though they were using them in the same way that CUDA uses the angle bracket < <<>>> notation, so I assumed it was proprietary syntax to Apple’s compute compiler. It also looked to me as though they were using them as a way of indicating the properties of different types of GPU based memory buffers. At a glance I assumed that the syntax was intended to be a clever alternative to CUDA’s older host->gpu and gpu<-host memory copy functions.

      Again without using the API it’s hard to tell but flattening or breaking up the linear graphics pipeline has a huge impact on the “capabilities” of the compute shader even if its functionality doesn’t change because it should mean that there are a much wider range of situations in which compute shader functionality can be mixed with graphics processing or more interestingly, used entirely independently of the graphics pipeline.

      I’ve got to believe the chip vendors have the most at stake on Khronos because they’re locked in a death dance with one another in which they need OpenGL but don’t want it to advance in any way that benefits a competitor more than themselves, yet advancing OpenGL is one of the best tools they have to pressure Microsoft and Apple to support their newest chip features. It’s a very tricky balance.

      • GL/DX compute shaders are separated from the linear graphics pipeline also. They’re like CUDA/CL shaders, except written in GLSL/HLSL, and with somewhat better integration with the GL/DX API resources.

        Metal has some cool stuff, but that… is what everyone else has been doing also.

      • Also, the [[texture(0)]], [[global_id]] stuff is hopefully quite obviously Metal-extensions. It reminds me a little of Microsot’s C++AMP project for C++-on-the-GPU. If they ever finish and release the specification for that…

        • I think they shipped and revised C++ AMP at least once, it definitely works on Windows 8. It was far from an alternative to CUDA the last time I looked at it however.

  2. I think some folks reading this may not fully appreciate how Metal is an interesting departure from classical graphics API’s. Metal doesn’t necessarily change anything about the hardware functionality, it exposes the programming paradigm in a new and more generalized way. Here’s the quote from the Apple documentation that mostly says it all:

    “The Metal framework is a single tightly integrated interface that performs both graphics and data-parallel
    compute operations on the GPU. Metal uses the same data structures and resources (such as command
    buffers, textures, etc.) for both graphics and compute operations.”

    Somebody reading this might say “Well.. OpenGL ALREADY DOES THAT” which would technically be true and yet completely miss the point. The key point is that the linear graphics pipeline is GONE which actually changes everything. It’s understandable that the implications of such a transition would be difficult for people who are versed in the linear graphics pipeline to get their heads around, but they are huge. They spell not only the end of the linear graphics API’s like OpenGL and Direct3D… at least as they were… they may spell the end of the CPU.

    • 1) DX, OGL both have more computationally potent alternatives to compute shaders.
      2) Underlying GPU architecture is still FIXED, first vertex, then fragment, compute alongside. (I’m not sure how firm boundaries are)

      So Metal right now (to me at least) is new View Point, or Paradigm, by which one can look at GPU, however any computation is done the old way, right now.
      (With most performance gains achieved by better design of DRIVER, which is parallel and cache everything and their gradmams ;) )

      Maybe now, driver is free to divide GPU into subparts and assign some jobs separately to them. (So something like introducing HyperThreading to the world of CPUs design).

      But is A7 design capable of actual separate operations?

      (BTW, blit engine manage memory movement afaics that also mean separating data transfer from processing of those data, that is more then “traditional” role of 2D engine on dGPU)

      For that matter Mantle Justification was Draw Calls, but slides about separately programmed engines on singe GPU where always there. Maybe AMD have less work to do on their GPU arch to implement Your vision of GPUs.

  3. As I study the Metal documentation more closely I can’t help but be reminded of Direct3D 1.0 with its early and much maligned Execute Buffers and the page of code it took to draw a triangle on the screen. It will be funny to see if Apple developers bitch about these “features” of Metal as loudly as game developers did about them in Direct3D 1.0, 20 years later. It’s funny, because if early 3D developers had not cried so much about Direct3D Execute Buffers being too hard to master, it’s likely that the modern DirectX API would look essentially like Metal does now and not have the performance limitations that it does today because it was “simplified”

    Here’s the Metal code required to render a single triangle on the screen in 3D…

    static const float posData[] = {
    0.0f, 0.33f, 0.0f, 1.f,
    -0.33f, -0.33f, 0.0f, 1.f,
    0.33f, -0.33f, 0.0f, 1.f,
    };

    static const float colorData[] = {
    1.f, 0.f, 0.f, 1.f,
    0.f, 1.f, 0.f, 1.f,
    0.f, 0.f, 1.f, 1.f,
    };
    id commandQueue = [device newCommandQueue];
    id commandBuffer = [commandQueue commandBuffer];
    id posBuf = [device newBufferWithBytes:posData
    length:sizeof(posData)
    options:nil];
    id
    colBuf = [device newBufferWithBytes:colorData
    length:sizeof(colorData)
    options:nil];
    MTLAttachmentDescriptor *colorAttachment = [MTLAttachmentDescriptor
    attachmentDescriptorWithTexture:currentTexture];
    [colorAttachment setLoadAction:MTLLoadActionClear];
    [colorAttachment setClearValue:MTLClearValueMakeColor(0.0,0.0,1.0,1.0)];
    MTLFramebufferDescriptor *fbDesc = [MTLFramebufferDescriptor
    framebufferDescriptorWithColorAttachment:colorAttachment];
    id framebuffer = [device newFramebufferWithDescriptor:fbDesc];
    id renderEncoder = [commandBuffer
    renderCommandEncoderWithFramebuffer:framebuffer];
    NSError *errors;
    id library = [device newLibraryWithSource:progSrc options:nil
    error:&errors];
    id vertFunc = [library newFunctionWithName:@"hello_vertex"
    options:nil error:&errors];
    id
    fragFunc = [library newFunctionWithName:@"hello_fragment"
    options:nil error:&errors];
    MTLRenderPipelineDescriptor* pipelineDesc = [MTLRenderPipelineDescriptor new];
    [pipelineDesc setVertexFunction:vertFunc];
    [pipelineDesc setFragmentFunction:fragFunc];
    [pipelineDesc setPixelFormat:currentTexture.pixelFormat
    atIndex:0];
    pipeline = [device newRenderPipelineStateWithDescriptor:pipelineDesc error:&errors];
    [renderEncoder setRenderPipelineState:pipeline];
    [renderEncoder setVertexBuffer:posBuf offset:0 atIndex:0];
    [renderEncoder setVertexBuffer:colBuf offset:0 atIndex:1];
    [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle
    vertexStart:0 vertexCount:3];
    [renderEncoder endEncoding];
    [commandBuffer commit];

  4. I would suggest watch the 3 Metal WWDC talks.
    It is available free without registration at their developer wwdc videos site.
    Even pdf presentation is available for free.

    The reason OpenGL lagged in OSX is because Apple only provided
    what their GPU supports which includes 3 old laptops. Intel them selves
    doesn’t support 4.2 yet. You should know this much.

    OpenCL 2.0 won’t be ready until end of the year. So that why there is no news.
    IOS doesn’t supprt OpenCL because the Imagination GPU is optimized
    for 16 bit float. Why would you do anything when OpenCl is about Double
    precision.

    Metal consists of two parts first is Objective-C api and Shader Language
    in c++11 language. It can take OpenCL kernel and work without too much
    problem.
    Metal is also about sharing memory between CPU and GPU that
    why it is not useful until Intel supports OpenCL 2.0 in their GPU.
    Plus Macs also have to have AMD and NVidia also have same support.

    In fact Apple had to write their own scaling code just to do Retina Displays
    because Intel’s didn’t match discrete GPU. Apple spent 5 years getting
    ready for Retina Display. Know I am sure you will critique Microsoft just
    as intensely for their past 30 years of dumbassery.

    Again these are simple thing that you as an expert should know.
    No knowing just shows your bias.

    • Uh… what bias are you referring to, I keep praising the Metal API? Sure I have bias but I don’t recall criticizing Apple. None of the information you cite is relevant to my point which is why I don’t make any reference to it. Did Apple announce OpenGL 4.2 support along with Metal at WWDC now that it’s apparently NOT a problem? No? huh… I’ll be darned… How many WWDC presentations do I need to watch to find NO OpenGL 4.2 announcement? OpenCL which Apple created wasn’t designed by Apple to work on their own devices and Apple tossed it into the public domain when they keep every other graphics API they make closed and proprietary? huh… I’ll be darned… maybe they made it public because they don’t give a damn about it like they do their other graphic API’s?

      Metal can’t take an OpenCL Kernel no problem, it has to be ported to a different syntax. They’re clearly NOT compatible, just similar.

      Again who is criticizing Apple? You Apple fans have weirdly glass jaws… why so emotionally fragile about simple discussion of their API strategy?

    • Actually I went ahead and took your advice and actually read some more of the WWDC transcripts. Did you see the bit where Tim Sweeney WITH Apple engineers just outright agrees with my analysis?

      Tim Sweeney: That’s right. Metal is a low-level rendering API, which means it provides the absolute minimum layer of software needed to support multiple versions of different graphics chips. It shields developers from the very low-level implementation details. It replaces OpenGL ES, which is an ancient relic of the Silicon Graphics era.

      Tim Sweeney: OpenGL ES is the bane of our existence there, because it not only has a lot of overhead, but it has all that overhead on the platform where you can least afford it.

      REPLACES OpenGL ES… That’s pretty clear isn’t it? Why would Tim Sweeney working closely with Apple on Metal erroneously come to believe that it was an OpenGL replacing API? Do you think Tim should read all of the WWDC transcripts to help him get his head right about Apple’s plans for OpenGL, or is it more likely that Apple TOLD Tim their plans? Bye.. bye OpenGL…

      • Well Did you see Unity’s aras in his twitter feed.
        saying it was the fastest and easiest port for any API.

        Those that want portability will get OpenGL ES.
        Apple has SceneKit which also obj-c api for 3d scene graphs.
        Apple has SpriteKit which is 2d games in obj-c.
        Apple has GLKit which wraps opengl in c api.

        OpenCL didn’t get updated because 2.0 isn’t ready.
        OpenGL didn’t get updated because Intel’s GPUs don’t
        support more than 4.2.

        Metal Shading Language is combining
        GLSL+OpenCL Kernel in a c++ api.
        obviously there is going to be changes and differences.

        Apple solved a problem for their developers.
        They asked for this api.
        Now those developer that use high level api will get Metal without
        learning OpenGL. It is simple as that.
        In the future most of all this will be converted to Swift anyway.

        PS. I had to change id because comment system was rejecting
        my comment.

        • Well then sure, DirectX is just like OpenGL when you put a wrapper around it.

          I’ll have a look at the comment filter system, I’ve had other complaints about it being “rejecty”.

  5. OpenGL has been held hostage by CAD companies
    for over a decade.
    That is why ES was created.
    OpenCL was created by Apple to bring
    CUDA outside NVidia GPU.
    NVidia was so scared that they quickly
    adopted LLVM and JIT compiling into CUDA.
    Now NVidia has been stuck with OpenCL 1.1
    while everyone else has move on to 1.2.
    NVidia is even more scared of OpenCl 2.0
    because it is really for integrated GPU
    with shared memory.
    So you expect Apple to revolutionize
    OpenGL that is designed by committee.

    Apple will even modify their Imgaination GPU
    just like they did with ARM core.

    If Apple wants to replace 30 year old Obj-C
    with Swift. You better believe
    OpenGL will be relegated to CAD support.

    So obviously Apple is up to no good.
    No one has problem with you critiquing
    the technology or API, likes, dislikes but trying subtle
    FUD to confuse younger people is what
    people who are commenting are objecting
    too.
    OpenGL will be supported as long as people
    use it. It is not within Apple’s power to shape
    a modern OOP API that is cross platform and
    that you will thank them for.
    There are limits.

    • What you just said is far more confusing and FUDDY than my simple and comprehensible observation that Apple is ditching OpenGL support. As far as I can tell, you just said a lot of stuff that converges on agreeing with exactly what I said. OpenGL is holding them back, they’re ditching it… simple! If you disagree with that statement I can’t figure it out from everything you’ve said about it. Of course it’s within Apple’s power to shape modern open API’s, they just did by shipping Metal, I did by shipping Direct3D. If Apple hadn’t just irrevocably influenced OpenGL’s destiny nobody would be here on my blog fretting about it.

  6. As a 24x7x365 CUDA hacker I think Metal really speaks to those of us focused on compute+interop but have come to realize that compute is a second-class citizen on the desktop and inaccessible on mobile.

    A silo’d compute API is _perfect_ when it’s driving TFLOP+ discrete devices. But when your modest GPU is part of a SoC or an IGP and in the hands of hundreds of millions of people then shipping and supporting another API for compute was always a risk.

    I think you and others have spelled out the technical and strategic benefits of Metal but I think what hasn’t been mentioned is that this is an indication that the ImgTech Series6 GPU is the first shipping (in quantity) mobile GPU that was ready for an API like Metal.

    The recent Intel IGPs (Iris HD5x00) in MacBooks/Airs would also seem like an obvious match for Metal on OSX.

    I wonder how many GPU vendors that struggle to ship expensive to develop feature-laden GL drivers are eyeing the Metal API and convincing themselves that a clean slate approach would let them stop trying to catch up to the leader’s drivers (NVIDIA), reduce costs/risk and expose more performance?

    Exciting times for consumer-oriented GPUs.

    TS: “I just assume that it has CUDA like properties such as the ability for a GPU kernel to launch other GPU kernel functions. I don’t recall noticing that Metal kernels supported this… if they don’t then it means that the API is still CPU bound,”

    This capability (Dynamic Parallelism) arrived in sm_35 and is more of a convenience on small GPUs where it’s easier to fully utilize the multiprocessor(s). You can enqueue many thousands of traditional (non DP) kernel grids and the CPU won’t be the bottleneck.

    • So you’re saying that Apple is shipping an OpenGL 4.2 capable GPU but DIDN’T bother to update their OpenGL support for it. Hmmm… interesting…

      Dynamic Parallelism was only available on Nvidia’s BIG Kepler based GPU’s until Maxwell as I recall. Now that Nvidia has Managed Memory the capability starts looking more like the functionality you would need for a parallel OS or game engine to run entirely on the GPU without the CPU’s interference at all. As you are probably aware, CUDA has CPU callback support from the GPU as well. Dynamic Parallelism is one of the features a GPU needs to abandon CPU dependency altogether.

        • Oh funny, Apple’s OpenGL support was so vanishingly small that I had to re-read the article several times to figure out that they were actually included in it.

        • This is the first time I’ve gotten a flood of comments from people who seem to be upset about something they can’t express. I can’t figure out if they are mad at Apple or mad at me for noticing that Apple is moving away from OpenGL… or why they care since OpenGL support on mobile OBVIOUSLY sucks for everybody, so why love OpenGL for no good reason? My garbage is “open” to the public, that doesn’t mean it has any value… being “free” and “open” is generally an indication that something has LOST its value. OpenGL when it was at it’s best was proprietary with a thin open veneer. Now that it has lost Apple’s support, it’s likely to deteriorate faster. I’m performing a public service by pointing it out. APPLE wants the market to get that message. It’s obviously a huge concern to a lot of developers, why hasn’t Apple addressed it by proclaiming their renewed love and affection for OpenGL unless they feel they have already delivered their message with the launch of Metal and NOTHING ELSE.

          • Mention anything Apple and anything proprietary that they do and you’ll wake a horde of raging open computing white knights who’ll try and “prove” how Apple is evil and undermining the industry and what not.

            It’s kind of sad that so many people get their conspiracy theories hat on when it’s absolutely clear what Apple’s motivations are (and have been for a long while now):

            Apple does things that (they believe will) benefit Apple.

            They don’t have the time, nor the energy to think much about broader implications of killing OpenGL or not. Even though it might very well happen, it’s not something Apple cares about, in my opinion, and I think you agree.

            I also believe that most people who work on games and actually have a product to ship are, these days, far beyond writing their engines from scratch, so the “takes more code to draw a triangle” point is getting moot.

            But those are not the kind of arguments that even get through to the Raging Knights™, so next time you write something about Apple, just expect such comments in advance and try to ignore them. :)

  7. @ “The Saint”, You brought out a good point
    “This makes the host operating system, CPU and bus a growing performance bottleneck. Furthermore GPU’s have gotten so fast and versatile that they are increasingly capable of running the whole game…

    This makes me wonder about if the mobile GPUs will eventually be powerful enough to decode video. Currently for video texturing or streaming the Mobile CPU is doing the decoding eventually providing pixel buffers that have to be passed on to the GPU. Transferring the data over to the GPU hogs all resources (memcpy). A few years back Imagination Technology came out with Zero Copy Transfer to get rid of the memcpy for copying pixels to the GPU ..

    I even on iOS created a “Video Texture Altas” to get better performance …Now that Zero Copy is nothing new I am wondering when I can do all the decoding, texturing set up, and rendering all on the GPU….

    • I confess that I avoid mobile development in favor of supercomputing but I know the Nvidia K1 can certainly do it, unfortunately it’s not a mainstream mobile component. It sounds like mobile buses are the bottleneck based on your description. The next generation Nvidia chips will actually have ARM processors on the GPU board sharing access to memory. I believe the K1 already has this architecture which means it probably does the job. (I say “Probably” because I have one to play with but have been so pre-occupied with my K40 that I haven’t had time to play with it.)

Pingbacks

Leave a Reply

Follow

Get every new post delivered to your Inbox

Join other followers:

%d bloggers like this: