tilkinsc/main.cpp

Last active October 8, 2022 18:34

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/tilkinsc/e9ecf0e1653df40afdb9d62ff6d7b5cc.js"></script>
Save tilkinsc/e9ecf0e1653df40afdb9d62ff6d7b5cc to your computer and use it in GitHub Desktop.

Download ZIP

Getting fully started with wgl

Raw

See https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6

Author

tilkinsc commented Sep 11, 2021

Output:

C:\programming\VSCode\wglexp>test2
Pixel format: 11
*temporary*
GL_RENDERER: NVIDIA GeForce GTX 1080/PCIe/SSE2
GL_VENDOR: NVIDIA Corporation
GL_VERSION: 4.6.0 NVIDIA 471.41
GL_SHADING_LANGUAGE_VERSION: 4.60 NVIDIA      

*final*
GL_RENDERER: NVIDIA GeForce GTX 1080/PCIe/SSE2
GL_VENDOR: NVIDIA Corporation
GL_VERSION: 4.6.0 NVIDIA 471.41
GL_SHADING_LANGUAGE_VERSION: 4.60 NVIDIA      
GL_CONTEXT_PROFILE_MASK: 1

taniwha commented Mar 20, 2022

Thank you for this. I am working on updating QuakeForge's GL support to 3.0 for the old renderer and probably 4.6 for the new, and didn't want to abandon windows support after having gotten it working again last year.
It took a couple of extra defines to get this working in mxe:

#define __STRSAFE__NO_INLINE
#include <strsafe.h>

and

#define GLEW_STATIC
#include "GL/glew.h"

The former is possibly because I do only static builds (or maybe gcc version as it was an issue with inlining causing duplicate definitiion errors when linking), and the latter is definitely because of the static builds. However, with those tweaks, it seems to run nicely (under wine64).

Author

tilkinsc commented Mar 30, 2022

@taniwha I'm glad this helps you.

P-Squiddy commented Oct 2, 2022

Really appreciate the example you've provided here. I've been curious what OpenGL looks like without GLFW or something similar.

I, having very limited knowledge of OpenGL, and even less knowledge of multi-threaded rendering, would find it quite useful if an update could be made to demonstrate how to update an object on one thread, and render it on another.

Thanks again.

Author

tilkinsc commented Oct 2, 2022

You need mutual exclusion (mutex) for the things that > 1 thread cant access at the same time. But the volatile keyword should do fine for most cases (preventing compiler optimization so it doesnt optimize away your symbol). A multithreaded renderer has purposes.

Evenly distribute render load
Lay off thrashing a single core
Speed up possible
You want to keep game logic separate but game logic sometimes requires opengl
You want a thread to render a specific set of things

You basically need to do what I did above but copy the create thread part again. Any data that is shared between two threads should be marked volatile and uses mutexes as necessary. One thing to note is you cant guarantee that the context you send over to the other thread is undefined behavior. This can happen when the main thread doesnt lock up or delay after spinning up a render thread. When you render something using the same context over multiple threads you have to treat it as another render. If you use two+ contexts between more than one thread, you might want to consider glShareLists(glrc1, glrc2) which will share resources between the two and save you hassle of reuploading your data. This can be really handy when you only set the state of a context once, so that you can shove it to a thread.

Author

tilkinsc commented Oct 3, 2022 •

edited

Loading

As a hindthought, I do have a version of this that was improved I think.

P-Squiddy commented Oct 4, 2022

It would be great to see an improved version if it's not too much trouble!

Author

tilkinsc commented Oct 5, 2022

After reviewing the code I have determined that since opengl calls are async (except for the obvious few), you should just bucket rendered objects and update methods into one thread and have child threads (update/render) acting on them as necessary using mutual exclusion locks to protect thread racing when needed (one read, one write, but read happens after write expecting to be before the write instead) and volatile keywords to prevent optimization. It's up to you if you want to generate 2+ contexts. I recommend you do, as sharing context requires context bind tracking which probably just reduces the performance of your code. You can further reduce this by glShareLists as mentioned. I would say you break out code into its own thread when you notice you are mutating the opengl state back and forth to a certain way each frame.

The improved version I was talking about just C++ified this code into classes which honestly is a lot harder to grasp, not commented, and has a few compile errors because it was a WIP that I am not going to fix. The concept of creating a new thread is above already.

What the above code needs to do different:

Create new contexts in the spawned thread and bind them
glShareLists
Drop reference of initial context or keep it to spawn a new one in case of context loss and want to plan what to do when main thread loses context (damn drivers)

So not a lot of code probably 10 lines of non boilerplate

P-Squiddy commented Oct 6, 2022

bucket rendered objects and update methods into one thread and have child threads (update/render) acting on them as necessary using mutual exclusion locks to protect thread racing when needed

This is what sticks me every time I come back to a multithreaded render -- that is how the main thread populates the container(s) the render thread uses to actually draw the objects, or make pipeline adjustments here and there.

Such as, each command the render thread needs to execute comes from the main thread, but...how?

The improved version I was talking about just C++ified this code into classes which honestly is a lot harder to grasp, not commented, and has a few compile errors because it was a WIP that I am not going to fix

I'd definitely be interested in seeing a more OO / C++ version of this! I have no problem with fixing any compile errors myself; I struggle a lot with the design and architecture side.

acceleration3 commented Oct 6, 2022

@P-Squiddy The basic architecture to a good OpenGL mulithreaded renderer goes like this. You have a scene composed of objects to draw. The way you store the objects is up to you but most developers choose a graph, which is a tree like structure containing objects represented by hierarchy and dependencies. Your goal is to then generate multiple command queues to render these objects using as many threads as you can. You have the say on how to arrange these threads (by object type for example, or even by creating a thread pool and breaking up every object into it's own task that gets sent to the pool). Once a thread completes generating the commands queues, you send them to the rendering thread, being mindful that some might need to be in a certain order before submitting.

Using a separate shared context in OpenGL nets you the benefit of having asynchronous memory transfers, similar to having a separate transfer queue in Vulkan(if you are familiar with that). Buffer and texture uploads can be sent to the thread where the shared context is current instead and on completion communicated back to other threads (with for example an std::future).

This is a significant effort but improves rendering performance by quite a lot. I hope I helped making it clear for you.

acceleration3 commented Oct 6, 2022 •

edited

Loading

@tilkinsc I don't understand why you are deleting my feedback. I'm not trying to be condescending or anything. Rather I'm just trying to help you out since these things aren't very well documented. Or rather they are but the documentation isn't all in one place nor is it that easy to find.

The checks to see if a window is created are superfluous. WM_CREATE is one of the only window messages that don't require the message loop to run for them to be executed. The CreateWindow calls themselves call the WndProc with WM_CREATE before even returning (you can even check this yourself with breakpoints), meaning everything executed after CreateWindow already happens after the window was created. Your loop won't even enter to begin with because temp_window_created will already be true before it starts.
You are using PeekMessage which is a non-blocking function inside a busy loop to handle the main window's messages. This is a waste of CPU time as it's constantly trying to get messages that most of the time aren't in the queue yet. You should be using the blocking GetMessage instead which blocks until messages actually arrive in the queue. The CPU usage difference is significant (at least one CPU core's worth). Not only that but GetMessage also provides you with a way to know when the window closed meaning you don't have to use the boolean value you are setting (window_active).
You need to check if extensions are present for some of the functions you are calling. wglChoosePixelFormatARB , wglCreateContextAttribsARB and wglSwapIntervalEXT are your main offenders. In older OpenGL implementations where these functions aren't present, you need to fall back to their GDI counterparts (ChoosePixelFormat and wglCreateContext). You are also not using the extension WGL_EXT_pixel_format which would be the only extension to exist in earlier OpenGL drivers.
You create a context at line 417 (gl_context) which you make current on the main application thread but you never do anything with it (no OpenGL functions are called in the main thread) so it isn't really needed.
You do some weird repeating glClear and SwapBuffers before rendering in the render thread which aren't needed.
You use glFlush before SwapBuffers which isn't needed. SwapBuffers does an implicit glFlush (and so does wglMakeCurrent on the current context, just before switching). You can check the disassembly for SwapBuffers and check for yourself. If you don't have a disassembler or don't know reverse engineering, measure the time glFlush takes to execute the context's pending commands and how much time SwapBuffers takes to execute. You can then remove glFlush and notice everything still works and not only that, but now SwapBuffers takes more time to execute, which is the time the glFlush would have taken being added to SwapBuffer's.

Author

tilkinsc commented Oct 7, 2022 •

edited

Loading

Because your comments didn't add anything not even a little bit. They weren't even coherent. This isn't meant to be documentation. these are my gists and I save code here. This is also an old version of this file. A more recent one can be found in my gists but it has commented code when I was experimenting with AMD extensions for WGL. This post was made 13 months ago.

Good to know that Windows isn't consistent in this and I would love to drop those checks. I originally developed this from reading what little wiki there was.

I know. Again this wasn't meant to be a credible example it's a personal paste.

No. You do not need to check for extensions to be present. WGL extensions are TOO widely supported by even the crappiest of computers and will always be available. If the person running the program doesn't have the capabilities to do so then it isn't my problem. Turn your cheek over towards opengl extension checking instead. I have no intentions of supporting such computers. That's on a personal note.

Remember this is just getting contexts created, not actually doing anything. A context is originally needed to be created before you can use wglChoosePixelFormatARB/etc. From there you can create another context with the correct pixel format. This I am not wrong on. This example doesn't even work or shouldn't work.

This clear isn't weird. It ensures the window has a certain color before it is visible. It is setting the color of both buffers. While they aren't needed again this is my code and what I like.

Didn't know about SwapBuffers flushing.

Here is the gist: https://gist.github.com/tilkinsc/7f383faccf3722622f5d0cc9bd45e7e6

Author

tilkinsc commented Oct 7, 2022 •

edited

Loading

Check it out I did a huge update.

I also came to the conclusion that having multiple render threads is more work than is needed, creates race conditions, breaks opengl/drivers ability to optimize, and has no point because nearly everything is async. I think the only time you want multiple threads and contexts would be when you have multiple windows.

acceleration3 commented Oct 7, 2022

I didn't realize you had moved past this gist, I apologize. However this gist is what shows up on Google when you search "WGL" which explains the level of engagement this particular gist gets.

You are right that using multiple threads is fighting against the design of OpenGL but it DOES net you a significant performance boost, it just doesn't come entirely from the GPU (only the async DMA part). OpenGL draw commands can only be sent by one thread and that can't be changed because that's how GPUs work in general but if you generate commands from threads and get them to execute on the render thread you are utilizing much more of your CPU, without mentioning the async DMA you get from a shared context in a separate thread. In fact this is what Vulkan does. While OpenGL hides the "command queues" behind context objects, Vulkan gives you access to command buffers you can use. You mention drivers breaking but actually they have evolved to accept this kind of process (and OpenGL's API has too) since developers are using OpenGL more and more like this which even helps towards making an abstraction layer of OpenGL, Vulkan and DirectX 12.

If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.

P-Squiddy commented Oct 8, 2022

If you want I can provide an example of OpenGL threading with software command queues and an async DMA thread.

I would be interested in seeing this myself!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment