Collaboration and Open Source at AMD: Blender Cycles

This article is part of an occasional series about what developers can do when they collaborate. AMD is a real believer in open source projects. Our developers actively contribute to and maintain a variety of open source projects, from highly optimized math libraries to… well, let’s talk about Blender Cycles.

Blender is a free, open source 3D animation suite. Blender includes the Cycles render engine. The Cycles engine converts a 3D model into the 2D representation you see on the computer screen, using ray tracing technology. Ray tracing is a very math-intensive process. In fact, it is so math-intensive it is not commonly used for most games or other applications. Ray tracing produces very pleasant visual effects and is typically used when rendering for film or other cases where real-time results are not required but high fidelity is. Using the compute capability of a GPU can improve performance of such renderers.

AMD undertook to improve the support for GPU compute inside Blender Cycles. Prior to this effort, the GPU kernel used for rendering was monolithic and huge. As a result of the kernel’s size, the generated code had to spill/unspill registers. These spill/unspill operations cause slower performance, and reduced occupancy. (Occupancy represents the actual number of waves running on the GPU simultaneously. More is better.)

In addition to producing inefficient code, the compiler would sometimes not successfully complete the build, or would generate incorrect code that could lead to black screens or a kernel hang. These are certifiable “bad things.”

At the Blender WIKI page on OpenCL™, you’ll see this: “Cycles was included into blender with the release of 2.61 in December 2011. The release notes mention: ‘… OpenCL, which is intended to support rendering on AMD/ATI graphics cards’. Ever since the support or lack thereof in cycles has been a topic of debate.” For sure.

(Note: it’s a WIKI page. This text might be updated at any time. This is what was there before we did our work!)

There is a Q&A at the end of that WIKI page, and this is intriguing.

Q: Why don’t you just split up cycles so it can run better on AMD hardware?

“A: While this would likely help it is not a trivial matter to split up cycles in this way. Also it is not clear that it is going to help and how much. As a resource constrained open-source project this will most likely not be a top priority.”

That’s the need that AMD attacked. It is (or perhaps I should say “was”) a decidedly non-trivial task to split the existing monolithic and large kernel. A resource-constrained open source project has to prioritize carefully. So AMD dived in to help.

(Edit for clarification: We didn’t do this entirely on our own, by any means. Like any well-run open source project, there is a commit process. Our submission was reviewed and modified by the Blender community.)

We turned the monolithic kernel into a pipeline of about 10 new small kernels that run in sequence. Conceptually, the algorithm works like this: move all the required data from the CPU to the GPU. This takes time, and is a performance hit. But the computational performance gains will outweigh this transfer cost. Then the smaller kernels operate on the data, communicating via device memory to avoid relatively slower communication with the CPU.

Once the small kernels have processed the data, the data goes back to the CPU. The CPU may then repack the work for the next iteration, so it processes more efficiently. Multiple iterations follow until enough have occurred to provide the image quality desired. The number of iterations varies, depending upon the nature of the scene being rendered, and the quality settings.

The packing process performed on the CPU is significant. Multiple waves that each end quickly are packed together into fewer waves, increasing utilization. Each iteration becomes more and more efficient, typically following an asymptotic curve – lots of improvement in the second iteration, a bit less in the next, and so on.

For example, assume you have a GPU with the capacity to run 50 waves simultaneously. But you have 500 waves you must run to process your data. You start with 10 batches of 50 waves. After the first round, the CPU packs the data and the number of waves might drop to, say, 300. Waves that ended quickly, leaving compute units idle within the GPU waiting for other waves to complete, are packed into a single wave. This means less idle time in the next iteration. Not only that, now there are fewer batches to process.

So, what does this look like in the real world? Well, a Beemer seems like a nice car, so we picked that as a model to render. You can get the Blender model here. The model is pretty nice.

UPDATE and Editor’s Note: Based on excellent feedback from our readers, we have added info on discrete GPU performance. The text below reflects this change.

We rendered the image in Figure 1 three ways. Precise system details are below.

  • using the CPU for all calculations
  • using the integrated graphics capability of an AMD APU
  • using an AMD graphics card

 

Blender GPU image
Figure 1: The BMW1M-MikePan Blender model

Handily, Cycles tells you how long it takes, so no stopwatches were injured during these tests.

So, what difference does it make to enable GPU Compute in the new Blender Cycles?

See for yourself.

blender test results image
Figure 2: Test results show significant speed improvement using GPU Compute

For the CPU Only and APU Compute tests, we used an AMD A10 7800B APU. The computer had 8 GB of memory. We were running Windows® 8.1. For the discrete GPU test, we used a Radeon™ HD 7970 (Tahiti).

As noted earlier, ray tracing is very mathematically intense. Without GPU compute, rendering this model took a bit more than 38 minutes. The APU test took 9:38. With the Radeon graphics card, 1:42.

Your mileage may vary. For example, the Radeon graphics card has 32 compute units vs. 8 for the APU. So it is clearly capable of getting through the math faster. As well, there are many options you can set in Cycles (like quality) that will affect the time it takes to render the image. Nonetheless, this gives you a flavor for the kind of speed improvements that GPU compute can provide.

These changes to the OpenCL kernel inside Blender Cycles are available in version 2.75. You can get Blender 2.75 here.

So who wins here?

AMD wins. Our culture of supporting open industry standards and open source projects means that software runs better on our hardware. From our perspective, cool tools like Blender make our graphics cards more popular. We are a for-profit company after all. But why not help everyone else along the way?

Users win. Blender users, creative artists—many of them perhaps in the indie community—get a faster tool.

Developers win. Remember that quote from the Blender WIKI about this being non-trivial work? Very true. Somebody needed to do it, and we have the skill and the experience. AMD’s work is now out there in the open source community for you to see and learn from. Not to mention the direct benefit to those who, in their copious spare time, build and maintain Blender. Our hat is off to you, we’re happy to help out.

More generally, we build tools for developers so you can accelerate code by taking advantage of the available GPUs. You can learn about and download the Accelerated Parallel Processing SDK at AMD’s Developer Central website. You can learn a lot at our blog series, OpenCL 2.0 Demystified.

This is a win-win-win scenario. What’s to lose?


 

Jim Trudeau is Senior Manager for Developer Outreach at AMD. Links to third party sites and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

Windows is a registered trademarks of Microsoft Corporation. OpenCL is a trademark of Apple Inc. used by permission by Khronos. No endorsement of AMD or any of its products by BMW AG is expressed or implied.

47 Responses

    • BillDStrong

      Daz Studio is no a situation they can change. DAZ3D chose to partner with Nvdia for their proprietary Mental Ray technology, which uses Nvidia’s closed CUDA programming language.

      Nvidia would need to open the specs for CUDA to be implemented on other hardware. Now, for DAZ, you could use Reality 4, which uses LuxRender under the hood, and does in fact support OpenCL and thus AMD or Nvidia GPUs.

      Or, Blender’s Cycles is available as a standalone renderer, and could be made into a plagin for DAZ Studio.

  1. Terry Ritter

    What in the world were you thinking? Moving data back and forth between CPU and GPU on a Kaveri APU? That should be history by now. If you want to show how things work better on AMD hardware, you need to demonstrate CPU and GPU independently working on common data in main memory, which thus never need be “moved.”

    • Jim Trudeau

      Hey Terry, thanks. I absolutely agree with you, but I do not know the state of the Blender code base WRT support for OpenCL 2.0 features like shared virtual memory. The engineer I’m working with on this will be back early August. I’ll dig into that. If it’s supported, we’ll run this on that hardware too, and on a dGPU so we all get a good idea of the performance impact. If SVM/OpenCL 2.0 is not supported by Blender, I’ll update the blog to indicate that. We don’t own the code base, so we can’t always get what we want. 🙂

    • jtrudeau

      Terry, an update. We did not push OpenCL 2.0 changes upstream. What we committed was major as it was, and the community had a lot of work to do to process, tweak, improve, and test the commit. So no support at this time for shared virtual memory.

  2. Santiago Shang

    Well, I’ve always used AMD cards until The Cycles came in to the Blender Stage… And the problems with openCL pushed me to use Nvidia Cards… So… I hope you guys… and Blender Foundation do a very nice integration for coming back to AMD cards. 🙂

  3. GraphiX

    Hey. I didn’t know you helped with FOSS projects. Thanks! I think that every for-profit software company should help take a FOS program under their wing. It would make the art community a lot bigger and more open.

  4. Piotr Adamowicz

    Kudos to the wonderful people at AMD! Many companies wouldn’t have bothered unless it was a trivial fix, but AMD really came through for the Blender community! Many thanks!

  5. dufloch

    Your work on cycles is really amazing. The original patch was even better than what is now in official build. The original patch was about 50% faster thank to the original “selective node compilation” implementation. Could you further help Blender devs to get that full power back (there implementation is much more limited with predefined groups of nodes), or if it’s not accepted, build an AMD version of Blender with just this change like Fluid Designer?

    • jtrudeau

      Well, I don’t run the company, but I’ll guess that us building a unique version of Blender is not likely to happen. We typically want to push generally-available solutions, not one-offs, and work with the community to the codebase is maintained. However, the feedback and widespread desire for continuing improvement, which is clearly possible, THAT I guarantee will get into the engineering team who worked on this.

  6. PGT

    Thanks you helped so many Blender users.
    I am curious will you people keep working on the open source blender ?

  7. architekt

    This is great news for future developments Blender Cycles.
    But whether the development of drivers for AMD keep up with Nvidia? I am currently working on the motherboard and processor from AMD. But graphics is Nvidia which is difficult to give up just because of drivers
    We will see …

  8. Anuga

    So, basically, now it’s safe to go buy AMD’s Graphic cards for rendering purposes, in Blender? 🙂

    • jtrudeau

      Well, safer anyway 🙂 Seriously, the number of possibilities, options, hardware configurations… I learned a long time ago not to make absolute guarantees on anything technical. One thing I overlooked when we were working on this was to run this same model on a dGPU. As noted in another reply, I’ll revisit that in August when my engineer gets back from leave.

      • Futurehack

        Kudos to the whole AMD team for giving back to the community. There are many excited AMD customers on Blender discussion forums talking about this. Please do keep up the good work.

        Anuga – Some caveats to be aware of with the current Blender 2.75 (+Cycles) release:

        “Only Windows and Linux are officially supported. On OSX there are still issues”
        http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.75/Cycles
        “No support for HDR (float) textures at the moment.”
        http://blender.org/manual/render/cycles/gpu_rendering.html
        No OpenCL (AMD GPU) rendering of Transparent Shadows, Volumes, or SSS:
        http://blender.org/manual/render/cycles/features.html

        • dufloch

          Transparent Shadows work in latest buildbots, so it will be in Blender 2.76 also. Note that they are a bit slow at the moment and you need Catalyst 15.7 (version which also nearly doubles the rendering speed compared to 15.6 beta :). SSS and volumes work only with experimental for CUDA and are slow and buggy (takes huge amount of RAM that leads to crashes or scene not rendering at all). Get a 8GB card if you can, then you are good to render anything you can think of.

          • e123

            Just posting to confirm that the latest buildbot (compiled ~july 21st, win x64) does fix transparent shadows.

            They do have a pretty noticeable performance impact for now, as you said. Definitely a good bit of room for optimization there.

            Currently, I’m seeing over double the time for the exact same scene/settings on 15.7. The heavy performance impact is a clear outlier when compared to other OCL engines.

  9. jtrudeau

    A bit of a universal reply: you’re welcome. 🙂 Based on the feedback so far, I have a little work to do. If it makes sense, I’ll update this blog in early August with some more data regarding OpenCL 2.0 on an APU and performance on a discrete GPU. And the clearly-heard desire for continuing improvement will get to the engineering team. I can’t speak for priorities, because like everyone we are resource-limited as well. But your feedback will help inform whatever decisions are made. Thank you.

  10. iccha

    Thank you a lot for these amazing changes you did to the renderer. Even though I don’t have an AMD card now(used to use them many times, but currently not..) I also see it the way that in the future I can choose your hardware again.
    This is just nice to see a big company giving something back to people.
    Also, regarding cycles, I have very much been struggeling with bugs in baking code. As baking is all about games, and AMD cards in big percentage too, wouldn’t AMD also consider improving the baking part of blender-cycles?

    • jtrudeau

      Thanks for the suggestion, iccha. I’ll make sure the team hears it. I suspect there’s more work to do in Cycles, but they’ll make priority calls.

      • Douglas E Knapp

        I am happy to see your work. I have not used AMD for over 15 years because of lack of Blender support or perhaps I should just say that Nvidia just worked so that is what I have had to buy.

        Blender is the most complicated program that I know of. I am sure that Blender development could use up the whole AMD staff, if you let it. I hope you do! 🙂 The more great Blender Devs the better!

        Obviously I plus One the idea of giving the game engines some love but baking is very important even in film use because many indie filmmakers can’t afford to re-render textures. They need the speed of baked textures because they don’t have the cash for a render farm.

  11. Shane

    Well, well, well. What would have taken over an hour to render with my CPU (i5-3570k at 1000 samples) took a mere 10 minutes and 21 seconds… Needless to say, I am very, very impressed.

  12. e123

    Okay, some reports here:

    Firstly, it’s working very well for me. I’m primarily interesting in foliage rendering/outdoor/landscape. Car shader/glass performance doesn’t exactly interest me as much.

    That said, I’m impressed by the firefly rejection. I assume it has something to do with the Sobol implementation.

    It’s easier for me to show the current performance through a screenshot:
    http://ibin.co/29XAeH2875vX
    (I swear on my life that is a non-malicious link)

    You get fantastic ogl viewport performance, even while the scene preview is rendering on both GPUs (2x Hawaii, 15.7, windows 10 10162 (about to throw my Tahiti back in). The scene above is a non-ideal situation. The cap textures are not yet trimmed (exposing some white vertices, a bad thing for raycasting) on the tree (custom textures) and I have broken transparency on the default ST library bamboo, non-instanced. Regardless, the performance is still very good. Edits while actively rendering are very responsive.

    I can induce crashes sometimes (you can in basically all software if you try hard enough..) by importing 6kx4k textures while actively rendering. That said, I’ve yet to find a clearly re-producible crash. It usually works without a problem.

    Now, I am impressed by the performance for my particular focus and the general blender interaction (as expected from a native engine). However, I still find luxrender to often provide more raw performance when utilized correctly. No, it’s not a native engine and it’s not as easy to use, but it is rather impressive.

    I hope you have at least have established a dialog with Dade at luxrender. Note their independent implementation of micro-kernels, while the main blender gpGPU compute dev continually attempted to explain his decision to 1) target cuda and 2) target giant, monolithic kernels.

    Correct me if I’m wrong, but a giant monolithic kernel is directly in contrast to fundamental gpgpu programming.

    Please, please just extend the same level of support to luxrender as you have to cycles, even if technically dade and the luxrender team do not require the same level of assistance in terms of properly utilizing the raw compute performance offered by AMD hardware.

    This is fantastic work that you and your team have accomplished, I do not want to diminish your accomplishment in any way.

    You, your team and AMD have dealt with a situation that shouldn’t have existed (in my opinion, at least) in an incredibly pro-active fashion.

    I think it was over 2 years ago that I saw AMD commit to solving the monolithic kernel issue in cycles in the dev forums. The commitment to support a rather antagonistic software base and end results really speak for themselves, I think.

    This work will be underappreciated, but take it from someone who has been building entirely AMD systems since the Duron era (and never experienced an AMD hardware failure, even with hard volt modding and inaccurate variable resistors), it is very, very much appreciated by some of us.

    • jtrudeau

      Another real world report. To me these are worth more than anything I do. I am not a digital artist! It is gratifying to see what I believe to true, confirmed.

      I will pass your comments re Luxrender to the team. We aren’t resource rich either, so they’ll make priority calls on where to get most bang for the buck, both from the AMD benefit, but also community impact. In learning and researching for this article on Cycles, I have come across LUxrender. Who knows, maybe I’ll get to write about that too 🙂

      Thanks for the feedback and the details. Genuinely appreciated.

      • e123

        Likewise! I really appreciate the reply.

        In terms of resource constraint, I was thinking a dialog with the luxrender team could be mutually beneficial to both kernels in some ways. I don’t know if that’s true, really just the fact that you’re aware of luxrender is the important part.

        Anyways, it’s all good here, as-is. I am currently getting fantastic performance and stability with both engines (for instance, 2x Hawaii and 1x Tahiti at ~ stock clocks match 4x 980 overclocked in luxrender).

        Really, I can’t reasonably ask for more than that, even if I set resource constraints aside 😉

        Sound like 32-bit is well within the pipeline for cycles, looking forward to it and I wish you a smooth implementation.

        Many thanks for making this happen!

      • e123

        Hi again,

        Forgot to ask you if testing on cayman would be of any value. I have a stack of them here, just let me know and I will switch them out.

        IIRC, full/proper support is limited to GCN 1.0, correct?

  13. Galoa

    This is great, keep up the good work (but probably remove the screenshot with the Intel CPU and nVidia GPU that you have posted).

    • jtrudeau

      Sharp eyes! I am not a regular Blender Cycles user. I believe that the CPU, GPU, and OS listed are the system where the model was created. It is definitely not the system on which we run the model. But I’ll confirm.

      • jtrudeau

        I did replace the screenshot. I have learned that there are various panels in Cycles, and what was displayed in the original may be someone’s (model creator?) benchmark setup, which is useless in this context. The new screenshot is based on the dGPU run and data I added based on user feedback.

  14. Tim Tuttle

    Bottom line me on a new graphics card for Blender. Does size matter? 2GB vs 4GB? What card do you recommend? Thanks for your help in getting Blender ready.

    • jtrudeau

      I highly recommend searching in the Blender community for confirmation, but I pinged a couple of folks here. The answer is, the scene you’re rendering must fit in GPU memory. So it matters depending on the memory requirements of the scene. I don’t have any experience personally with Cycles, so I can’t tell you whether there are many scenes that would exceed the 2GB line. If it doesn’t fit in memory, I’m told you’ll revert to the CPU for calculations, so you won’t see a GPU-compute-based speed improvement.

  15. jtrudeau

    I always like interacting with readers. You all have had excellent ideas and comments, and they are appreciated. We don’t normally edit or modify blogs after release, but this is the exception that proves the rule.

    Based on a couple of specific suggestions/requests, I have added information on Blender Cycles performance on a discrete GPU, using the new OpenCL kernels. So there are a few changes in the text, and in the graph. Bottom line, I should have had that information in there in the first place. 🙂 So thanks again for helping out.

  16. Pavel Petrov

    At last! Week ago i started fresh Blender build just to see whats new on there… and… SUDDENLY! Cycles renderer not hang at start, not being stupid while render, but just blow up me from the chair =) This is superb presentation of calobarion we all wait so long! And also this is a good example of code-optimisation (just compare old cycles and new one) for Radeon GPU. Thank you all, guys and firls at AMD and Blender! I am pleased so much so i have no engrish words to say how much i am pleased… heh =)

  17. Rob

    I’ve been using Nvidia cards for a long time now due to Cycles, and I’m glad to see I have the option of picking between two manufacturers now, I might just go AMD next time for the extra vram and shader cores. Keep up the good work guys, this is really helping to level the playing field in raytracing.

  18. Ray

    I’ve been considering a R9 390 build recently for the vram, but I’ve been holding off for the release of Blender 2.76 (released 5-days ago) as it promises wider AMD GPU support and improvements. Does anyone know what the current status is for the R9 390 (will it work with all Cycles features, SSS etc.)?.

    Status back in May 2015 – http://wiki.blender.org/index.php/OpenCL

  19. Tim

    Thank you for your help. I am surprised AMD stepped in.
    BEFORE:
    i wrote to your forums about Blender/Cycles and the problems with it… OpenCL compiler and so on was using all 32GB of my RAM and then crashing. If it even compiled it rendered garbage and so on. I thought the AMD will not care at all. I was about to switch to nVidia because of this.
    NOW: I am hugely surprised (in positive way) AMD stepped in and helped (despite problem proved to be on Blender side not AMD as previously belived). All my GPUs were ATI/AMD so far. Despite still missing some features THANKS TO THIS MY NEXT ONE WILL BE ALSO FROM AMD 😉

    P.S.: Good luck guys with the Zen CPUs in the future. Have FX-8350 now and anticipating the new Zen architecture 🙂

  20. HZP

    Hi Hello Blender Artist,
    Plz Help me in diz Q…. ATI radeon HD 5400 series graphic card 2gb supported or not in blender?