Skip to content

Instantly share code, notes, and snippets.

@tilkinsc
Last active October 8, 2022 18:34
Show Gist options
  • Save tilkinsc/e9ecf0e1653df40afdb9d62ff6d7b5cc to your computer and use it in GitHub Desktop.
Save tilkinsc/e9ecf0e1653df40afdb9d62ff6d7b5cc to your computer and use it in GitHub Desktop.
Getting fully started with wgl
See https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6
@P-Squiddy
Copy link

bucket rendered objects and update methods into one thread and have child threads (update/render) acting on them as necessary using mutual exclusion locks to protect thread racing when needed

This is what sticks me every time I come back to a multithreaded render -- that is how the main thread populates the container(s) the render thread uses to actually draw the objects, or make pipeline adjustments here and there.

Such as, each command the render thread needs to execute comes from the main thread, but...how?

The improved version I was talking about just C++ified this code into classes which honestly is a lot harder to grasp, not commented, and has a few compile errors because it was a WIP that I am not going to fix

I'd definitely be interested in seeing a more OO / C++ version of this! I have no problem with fixing any compile errors myself; I struggle a lot with the design and architecture side.

@acceleration3
Copy link

@P-Squiddy The basic architecture to a good OpenGL mulithreaded renderer goes like this. You have a scene composed of objects to draw. The way you store the objects is up to you but most developers choose a graph, which is a tree like structure containing objects represented by hierarchy and dependencies. Your goal is to then generate multiple command queues to render these objects using as many threads as you can. You have the say on how to arrange these threads (by object type for example, or even by creating a thread pool and breaking up every object into it's own task that gets sent to the pool). Once a thread completes generating the commands queues, you send them to the rendering thread, being mindful that some might need to be in a certain order before submitting.

Using a separate shared context in OpenGL nets you the benefit of having asynchronous memory transfers, similar to having a separate transfer queue in Vulkan(if you are familiar with that). Buffer and texture uploads can be sent to the thread where the shared context is current instead and on completion communicated back to other threads (with for example an std::future).

This is a significant effort but improves rendering performance by quite a lot. I hope I helped making it clear for you.

@acceleration3
Copy link

acceleration3 commented Oct 6, 2022

@tilkinsc I don't understand why you are deleting my feedback. I'm not trying to be condescending or anything. Rather I'm just trying to help you out since these things aren't very well documented. Or rather they are but the documentation isn't all in one place nor is it that easy to find.

  • The checks to see if a window is created are superfluous. WM_CREATE is one of the only window messages that don't require the message loop to run for them to be executed. The CreateWindow calls themselves call the WndProc with WM_CREATE before even returning (you can even check this yourself with breakpoints), meaning everything executed after CreateWindow already happens after the window was created. Your loop won't even enter to begin with because temp_window_created will already be true before it starts.

  • You are using PeekMessage which is a non-blocking function inside a busy loop to handle the main window's messages. This is a waste of CPU time as it's constantly trying to get messages that most of the time aren't in the queue yet. You should be using the blocking GetMessage instead which blocks until messages actually arrive in the queue. The CPU usage difference is significant (at least one CPU core's worth). Not only that but GetMessage also provides you with a way to know when the window closed meaning you don't have to use the boolean value you are setting (window_active).

  • You need to check if extensions are present for some of the functions you are calling. wglChoosePixelFormatARB , wglCreateContextAttribsARB and wglSwapIntervalEXT are your main offenders. In older OpenGL implementations where these functions aren't present, you need to fall back to their GDI counterparts (ChoosePixelFormat and wglCreateContext). You are also not using the extension WGL_EXT_pixel_format which would be the only extension to exist in earlier OpenGL drivers.

  • You create a context at line 417 (gl_context) which you make current on the main application thread but you never do anything with it (no OpenGL functions are called in the main thread) so it isn't really needed.

  • You do some weird repeating glClear and SwapBuffers before rendering in the render thread which aren't needed.

  • You use glFlush before SwapBuffers which isn't needed. SwapBuffers does an implicit glFlush (and so does wglMakeCurrent on the current context, just before switching). You can check the disassembly for SwapBuffers and check for yourself. If you don't have a disassembler or don't know reverse engineering, measure the time glFlush takes to execute the context's pending commands and how much time SwapBuffers takes to execute. You can then remove glFlush and notice everything still works and not only that, but now SwapBuffers takes more time to execute, which is the time the glFlush would have taken being added to SwapBuffer's.

@tilkinsc
Copy link
Author

tilkinsc commented Oct 7, 2022

Because your comments didn't add anything not even a little bit. They weren't even coherent. This isn't meant to be documentation. these are my gists and I save code here. This is also an old version of this file. A more recent one can be found in my gists but it has commented code when I was experimenting with AMD extensions for WGL. This post was made 13 months ago.

Good to know that Windows isn't consistent in this and I would love to drop those checks. I originally developed this from reading what little wiki there was.

I know. Again this wasn't meant to be a credible example it's a personal paste.

No. You do not need to check for extensions to be present. WGL extensions are TOO widely supported by even the crappiest of computers and will always be available. If the person running the program doesn't have the capabilities to do so then it isn't my problem. Turn your cheek over towards opengl extension checking instead. I have no intentions of supporting such computers. That's on a personal note.

Remember this is just getting contexts created, not actually doing anything. A context is originally needed to be created before you can use wglChoosePixelFormatARB/etc. From there you can create another context with the correct pixel format. This I am not wrong on. This example doesn't even work or shouldn't work.

This clear isn't weird. It ensures the window has a certain color before it is visible. It is setting the color of both buffers. While they aren't needed again this is my code and what I like.

Didn't know about SwapBuffers flushing.

Here is the gist: https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6

@tilkinsc
Copy link
Author

tilkinsc commented Oct 7, 2022

Check it out I did a huge update.

I also came to the conclusion that having multiple render threads is more work than is needed, creates race conditions, breaks opengl/drivers ability to optimize, and has no point because nearly everything is async. I think the only time you want multiple threads and contexts would be when you have multiple windows.

@acceleration3
Copy link

I didn't realize you had moved past this gist, I apologize. However this gist is what shows up on Google when you search "WGL" which explains the level of engagement this particular gist gets.

You are right that using multiple threads is fighting against the design of OpenGL but it DOES net you a significant performance boost, it just doesn't come entirely from the GPU (only the async DMA part). OpenGL draw commands can only be sent by one thread and that can't be changed because that's how GPUs work in general but if you generate commands from threads and get them to execute on the render thread you are utilizing much more of your CPU, without mentioning the async DMA you get from a shared context in a separate thread. In fact this is what Vulkan does. While OpenGL hides the "command queues" behind context objects, Vulkan gives you access to command buffers you can use. You mention drivers breaking but actually they have evolved to accept this kind of process (and OpenGL's API has too) since developers are using OpenGL more and more like this which even helps towards making an abstraction layer of OpenGL, Vulkan and DirectX 12.

If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.

@P-Squiddy
Copy link

If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.

I would be interested in seeing this myself!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment