-
-
Save tilkinsc/e9ecf0e1653df40afdb9d62ff6d7b5cc to your computer and use it in GitHub Desktop.
See https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6 |
You need mutual exclusion (mutex) for the things that > 1 thread cant access at the same time. But the volatile keyword should do fine for most cases (preventing compiler optimization so it doesnt optimize away your symbol). A multithreaded renderer has purposes.
- Evenly distribute render load
- Lay off thrashing a single core
- Speed up possible
- You want to keep game logic separate but game logic sometimes requires opengl
- You want a thread to render a specific set of things
You basically need to do what I did above but copy the create thread part again. Any data that is shared between two threads should be marked volatile and uses mutexes as necessary. One thing to note is you cant guarantee that the context you send over to the other thread is undefined behavior. This can happen when the main thread doesnt lock up or delay after spinning up a render thread. When you render something using the same context over multiple threads you have to treat it as another render. If you use two+ contexts between more than one thread, you might want to consider glShareLists(glrc1, glrc2) which will share resources between the two and save you hassle of reuploading your data. This can be really handy when you only set the state of a context once, so that you can shove it to a thread.
As a hindthought, I do have a version of this that was improved I think.
It would be great to see an improved version if it's not too much trouble!
After reviewing the code I have determined that since opengl calls are async (except for the obvious few), you should just bucket rendered objects and update methods into one thread and have child threads (update/render) acting on them as necessary using mutual exclusion locks to protect thread racing when needed (one read, one write, but read happens after write expecting to be before the write instead) and volatile keywords to prevent optimization. It's up to you if you want to generate 2+ contexts. I recommend you do, as sharing context requires context bind tracking which probably just reduces the performance of your code. You can further reduce this by glShareLists as mentioned. I would say you break out code into its own thread when you notice you are mutating the opengl state back and forth to a certain way each frame.
The improved version I was talking about just C++ified this code into classes which honestly is a lot harder to grasp, not commented, and has a few compile errors because it was a WIP that I am not going to fix. The concept of creating a new thread is above already.
What the above code needs to do different:
- Create new contexts in the spawned thread and bind them
- glShareLists
- Drop reference of initial context or keep it to spawn a new one in case of context loss and want to plan what to do when main thread loses context (damn drivers)
So not a lot of code probably 10 lines of non boilerplate
bucket rendered objects and update methods into one thread and have child threads (update/render) acting on them as necessary using mutual exclusion locks to protect thread racing when needed
This is what sticks me every time I come back to a multithreaded render -- that is how the main thread populates the container(s) the render thread uses to actually draw the objects, or make pipeline adjustments here and there.
Such as, each command the render thread needs to execute comes from the main thread, but...how?
The improved version I was talking about just C++ified this code into classes which honestly is a lot harder to grasp, not commented, and has a few compile errors because it was a WIP that I am not going to fix
I'd definitely be interested in seeing a more OO / C++ version of this! I have no problem with fixing any compile errors myself; I struggle a lot with the design and architecture side.
@P-Squiddy The basic architecture to a good OpenGL mulithreaded renderer goes like this. You have a scene composed of objects to draw. The way you store the objects is up to you but most developers choose a graph, which is a tree like structure containing objects represented by hierarchy and dependencies. Your goal is to then generate multiple command queues to render these objects using as many threads as you can. You have the say on how to arrange these threads (by object type for example, or even by creating a thread pool and breaking up every object into it's own task that gets sent to the pool). Once a thread completes generating the commands queues, you send them to the rendering thread, being mindful that some might need to be in a certain order before submitting.
Using a separate shared context in OpenGL nets you the benefit of having asynchronous memory transfers, similar to having a separate transfer queue in Vulkan(if you are familiar with that). Buffer and texture uploads can be sent to the thread where the shared context is current instead and on completion communicated back to other threads (with for example an std::future).
This is a significant effort but improves rendering performance by quite a lot. I hope I helped making it clear for you.
@tilkinsc I don't understand why you are deleting my feedback. I'm not trying to be condescending or anything. Rather I'm just trying to help you out since these things aren't very well documented. Or rather they are but the documentation isn't all in one place nor is it that easy to find.
-
The checks to see if a window is created are superfluous.
WM_CREATE
is one of the only window messages that don't require the message loop to run for them to be executed. TheCreateWindow
calls themselves call theWndProc
withWM_CREATE
before even returning (you can even check this yourself with breakpoints), meaning everything executed afterCreateWindow
already happens after the window was created. Your loop won't even enter to begin with becausetemp_window_created
will already be true before it starts. -
You are using
PeekMessage
which is a non-blocking function inside a busy loop to handle the main window's messages. This is a waste of CPU time as it's constantly trying to get messages that most of the time aren't in the queue yet. You should be using the blockingGetMessage
instead which blocks until messages actually arrive in the queue. The CPU usage difference is significant (at least one CPU core's worth). Not only that butGetMessage
also provides you with a way to know when the window closed meaning you don't have to use the boolean value you are setting (window_active
). -
You need to check if extensions are present for some of the functions you are calling.
wglChoosePixelFormatARB
,wglCreateContextAttribsARB
andwglSwapIntervalEXT
are your main offenders. In older OpenGL implementations where these functions aren't present, you need to fall back to their GDI counterparts (ChoosePixelFormat
andwglCreateContext
). You are also not using the extensionWGL_EXT_pixel_format
which would be the only extension to exist in earlier OpenGL drivers. -
You create a context at line 417 (
gl_context
) which you make current on the main application thread but you never do anything with it (no OpenGL functions are called in the main thread) so it isn't really needed. -
You do some weird repeating
glClear
andSwapBuffers
before rendering in the render thread which aren't needed. -
You use
glFlush
beforeSwapBuffers
which isn't needed.SwapBuffers
does an implicitglFlush
(and so doeswglMakeCurrent
on the current context, just before switching). You can check the disassembly forSwapBuffers
and check for yourself. If you don't have a disassembler or don't know reverse engineering, measure the timeglFlush
takes to execute the context's pending commands and how much timeSwapBuffers
takes to execute. You can then removeglFlush
and notice everything still works and not only that, but nowSwapBuffers
takes more time to execute, which is the time theglFlush
would have taken being added toSwapBuffer
's.
Because your comments didn't add anything not even a little bit. They weren't even coherent. This isn't meant to be documentation. these are my gists and I save code here. This is also an old version of this file. A more recent one can be found in my gists but it has commented code when I was experimenting with AMD extensions for WGL. This post was made 13 months ago.
Good to know that Windows isn't consistent in this and I would love to drop those checks. I originally developed this from reading what little wiki there was.
I know. Again this wasn't meant to be a credible example it's a personal paste.
No. You do not need to check for extensions to be present. WGL extensions are TOO widely supported by even the crappiest of computers and will always be available. If the person running the program doesn't have the capabilities to do so then it isn't my problem. Turn your cheek over towards opengl extension checking instead. I have no intentions of supporting such computers. That's on a personal note.
Remember this is just getting contexts created, not actually doing anything. A context is originally needed to be created before you can use wglChoosePixelFormatARB/etc. From there you can create another context with the correct pixel format. This I am not wrong on. This example doesn't even work or shouldn't work.
This clear isn't weird. It ensures the window has a certain color before it is visible. It is setting the color of both buffers. While they aren't needed again this is my code and what I like.
Didn't know about SwapBuffers flushing.
Here is the gist: https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6
Check it out I did a huge update.
I also came to the conclusion that having multiple render threads is more work than is needed, creates race conditions, breaks opengl/drivers ability to optimize, and has no point because nearly everything is async. I think the only time you want multiple threads and contexts would be when you have multiple windows.
I didn't realize you had moved past this gist, I apologize. However this gist is what shows up on Google when you search "WGL" which explains the level of engagement this particular gist gets.
You are right that using multiple threads is fighting against the design of OpenGL but it DOES net you a significant performance boost, it just doesn't come entirely from the GPU (only the async DMA part). OpenGL draw commands can only be sent by one thread and that can't be changed because that's how GPUs work in general but if you generate commands from threads and get them to execute on the render thread you are utilizing much more of your CPU, without mentioning the async DMA you get from a shared context in a separate thread. In fact this is what Vulkan does. While OpenGL hides the "command queues" behind context objects, Vulkan gives you access to command buffers you can use. You mention drivers breaking but actually they have evolved to accept this kind of process (and OpenGL's API has too) since developers are using OpenGL more and more like this which even helps towards making an abstraction layer of OpenGL, Vulkan and DirectX 12.
If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.
If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.
I would be interested in seeing this myself!
Really appreciate the example you've provided here. I've been curious what OpenGL looks like without GLFW or something similar.
I, having very limited knowledge of OpenGL, and even less knowledge of multi-threaded rendering, would find it quite useful if an update could be made to demonstrate how to update an object on one thread, and render it on another.
Thanks again.