Skip to content

Instantly share code, notes, and snippets.

@judwhite
Last active January 2, 2023 19:50
Show Gist options
  • Save judwhite/6693293ab8fe732ceeb32c2f3e535e55 to your computer and use it in GitHub Desktop.
Save judwhite/6693293ab8fe732ceeb32c2f3e535e55 to your computer and use it in GitHub Desktop.
ChatGPT Dec 15 Version (2022) Review: A know-it-all who simply doesn't.

ChatGPT Dec 15 Version (2022) Review

I wouldn't axe the subject matter experts quite yet.

ChatGPT lays down some solid answers to questions you'd get asked by someone who doesn't really want all the details. It's impressive as a work-in-progress, and a bit scary if your boss see this and immediately starts thinking about ways to reduce OpEx.

Not surprisingly, when you get it out of its comfort zone it attempts to overcompensate with length, spew jargon all over the place, and crank the humbleness up to 11 when you call it out on its bullshit. In other words, it's every smarmy know-it-all you've ever met.

Worst, for me, is how it doesn't adjust the way it presents information when it isn't confident. Whether it's summarizing the main ideas of a well-known topic or hurridly scanning Wikipedia in an attempt to respond quickly, it has the same feel in its response. I can't deal with that. Humans interacting with this thing need a hint about how it feels about its own response so we can establish some kind of Bayesian-updating through conversation. If it boldly lies to my face and I can't tell, other than I know the answer already, how can I trust it or my judgment of how trustworthy it is?

As of the Dec 15 version, ChatGPT is on a clear trajectory toward creating beginner level explainer videos on YouTube or running for office. Your job is safe, for now.

If you want, you can skip to the juicy parts.

And now friends, a conversation with ChatGPT

Which opening lines in chess were discovered with the computer assistance?

Computer chess has made significant contributions to the study of chess openings. One example is the Sicilian Defense, which is a popular choice for Black against 1.e4. The Sicilian Defense has many different variations, and computers have helped to identify which ones are most likely to lead to a favorable position for Black.

Another example is the Grunfeld Defense, which is another popular choice for Black against 1.d4. The Grunfeld Defense has also been extensively analyzed by computers, which has helped to identify the best ways for Black to counter White's attacks.

Other openings that have been extensively analyzed by computers include the Ruy Lopez, the King's Gambit, and the Queen's Gambit. In general, computers have been able to help chess players identify the most promising lines of play in a wide variety of openings, and have also helped to uncover new ideas and innovations in these openings.

Why am I not improving in chess after many months?

There could be a variety of reasons why you are not improving in chess after many months of practice. Here are a few possible explanations:

  1. Lack of consistent practice: Improving at chess requires consistent practice. If you are not practicing regularly, you may not be giving yourself enough opportunities to improve.
  2. Not challenging yourself enough: If you are only playing against opponents who are much weaker than you, you may not be adequately challenging yourself and pushing yourself to improve.
  3. Not analyzing your games: After you finish a game, it is important to take some time to analyze what you did well and what you could have done better. This can help you identify your weaknesses and work on improving them. If you are not doing this, you may not be aware of the areas where you need to improve.
  4. Not learning from your mistakes: It is natural to make mistakes in chess, but it is important to learn from those mistakes so that you don't repeat them in the future. If you are not paying attention to your mistakes and working on correcting them, you may not be improving as quickly as you could be.
  5. Not setting specific goals: Setting specific, achievable goals for yourself can help you stay motivated and focus on the areas where you need to improve. If you are not setting goals for yourself, you may not have a clear direction in your chess improvement journey.

There could be other reasons as well, and it may be helpful to get some outside perspective, such as from a chess coach or a stronger player, to help identify any potential issues and come up with a plan to address them.

How are transposition tables used in chess engines?

Transposition tables are used in chess engines to store information about positions that have been previously evaluated. This can help the engine to avoid re-analyzing positions that it has already seen, which can save a significant amount of time and allow the engine to search deeper in the game tree.

When a chess engine encounters a new position, it will first check to see if that position is stored in the transposition table. If it is, the engine will use the information stored in the table to guide its search. If the position is not in the transposition table, the engine will analyze the position and store the resulting evaluation in the table for future reference.

Transposition tables can be particularly useful in openings, where many of the same positions tend to arise repeatedly. By storing information about these positions in the transposition table, the engine can avoid re-analyzing them and instead focus on more promising lines of play.

Overall, transposition tables are an important tool in chess engines that can help them to search deeper and more efficiently, leading to stronger play.

What's the difference between fail-low and fail-high?

In the context of chess engines, "fail-low" and "fail-high" refer to the results of alpha-beta search, which is a search algorithm commonly used in chess engines to explore the game tree and evaluate positions.

During alpha-beta search, the engine will keep track of two bounds, known as alpha and beta, which represent the minimum and maximum values that the position can have, respectively. The engine will use these bounds to prune the search tree, only exploring lines of play that are likely to lead to a position with a value within these bounds.

If the engine reaches a position where it cannot find any moves that would lead to a position with a value within the bounds (i.e., a "fail-low" or "fail-high" position), it will return a special value called a "fail-low" or "fail-high" indication. This value tells the engine that it has reached the limits of the search and that it should stop searching further in that particular line of play.

Overall, the terms "fail-low" and "fail-high" refer to the results of alpha-beta search and are used to indicate that the search has reached the limits of the bounds and should stop.

How does pruning work?

In computer science, pruning refers to the process of eliminating parts of a search tree that are not likely to lead to a solution, in order to reduce the time and resources required to find a solution.

In the context of chess engines, pruning is often used to reduce the number of positions that the engine needs to analyze during its search. This can be particularly useful in complex positions where the search tree can become very large and time-consuming to explore.

There are several techniques that chess engines can use to prune the search tree, including alpha-beta pruning and null-move pruning.

Alpha-beta pruning is a technique that is commonly used in chess engines to reduce the number of positions that need to be searched. It works by keeping track of two bounds, known as alpha and beta, which represent the minimum and maximum values that the position can have, respectively. The engine will use these bounds to prune the search tree, only exploring lines of play that are likely to lead to a position with a value within these bounds.

Null-move pruning is another technique that can be used to prune the search tree. It works by assuming that the player to move will pass and allowing the opponent to make a move instead. If the resulting position is worse for the player who passed, the search can be pruned, since it is unlikely that the player would have chosen to pass in the actual game.

Overall, pruning is a useful technique that chess engines use to reduce the number of positions that need to be searched, allowing them to find the best move more efficiently.

Can you tell me more about null-move pruning?

Sure! Null-move pruning is a technique that is used in chess engines to reduce the number of positions that need to be searched. It works by assuming that the player to move will pass and allowing the opponent to make a move instead. If the resulting position is worse for the player who passed, the search can be pruned, since it is unlikely that the player would have chosen to pass in the actual game.

Here's how null-move pruning works in more detail:

  1. The engine determines that it is the player to move's turn.
  2. Instead of searching for a move, the engine "passes" and allows the opponent to make a move.
  3. The engine evaluates the resulting position and determines whether it is worse for the player who passed.
  4. If the position is worse for the player who passed, the search can be pruned, since it is unlikely that the player would have chosen to pass in the actual game.
  5. If the position is not worse for the player who passed, the search continues as normal.

Null-move pruning can be an effective technique for reducing the number of positions that need to be searched, especially in positions where the search tree is large and complex. However, it is not always appropriate to use null-move pruning, and chess engines will typically only use it in certain situations.

Overall, null-move pruning is a useful technique that can help chess engines to find the best move more efficiently by reducing the number of positions that need to be searched.

Can chess engine evaluations become less accurate with more depth?

In general, chess engine evaluations become more accurate with more depth, since searching deeper allows the engine to consider a wider range of possibilities and make more informed decisions. However, it is also possible for chess engine evaluations to become less accurate with more depth in certain situations.

One reason for this is that the search tree can become very large at high depths, making it difficult for the engine to explore all of the possible lines of play. This can lead to the engine missing important tactics or strategic ideas, which can result in less accurate evaluations.

Another reason is that the accuracy of chess engine evaluations is limited by the evaluation function that the engine uses. Even if the engine is able to search very deeply, the evaluation function may not be sophisticated enough to accurately evaluate the resulting positions.

Overall, while chess engine evaluations generally become more accurate with more depth, there are situations where they can become less accurate as well.

but if a line is incorrectly pruned due to limitations of the eval function, couldn't those inaccuracies compound as the depth increases?

Yes, that's correct. If a line is incorrectly pruned due to limitations of the evaluation function, the inaccuracies can compound as the depth increases, since the engine will not explore that line of play further. This can lead to the engine missing important tactics or strategic ideas, which can result in less accurate evaluations.

It's important to note that chess engines use a variety of techniques to try to avoid pruning lines of play that could be important. For example, they may use a variety of search heuristics, such as "killer moves" and "history heuristics," to prioritize the search and avoid pruning promising lines of play. They may also use techniques such as null-move pruning and late move reductions to prune the search tree more selectively and avoid pruning important lines of play.

Overall, while it is possible for inaccuracies to compound if a line is incorrectly pruned, chess engines use a variety of techniques to try to minimize this risk and find the best move as accurately as possible.

What is LMR?

LMR, or Late Move Reductions, is a technique that is used in chess engines to reduce the number of positions that need to be searched. It works by reducing the depth of the search for moves that are made later in the search tree.

The idea behind LMR is that moves made later in the search tree are less likely to be the best move, since the engine has already considered a wider range of alternatives. By reducing the depth of the search for these moves, the engine can save time and resources without sacrificing too much accuracy.

LMR is typically used in conjunction with other search techniques, such as alpha-beta pruning and null-move pruning, to further reduce the number of positions that need to be searched. It can be particularly effective in complex positions where the search tree is large and deep.

Overall, LMR is a useful technique that chess engines use to find the best move more efficiently by reducing the number of positions that need to be searched.

What is Move Ordering and why is it important?

Move ordering is the process of deciding the order in which the engine will search through the different moves that are available in a particular position. It is an important factor in chess engines because the order in which the moves are searched can have a significant impact on the efficiency of the search.

There are several factors that can be taken into account when determining the move ordering, including:

  1. History heuristics: This involves considering the moves that have been successful in similar positions in the past.
  2. Killer moves: This involves prioritizing moves that have been successful in the past and are likely to be successful again.
  3. Static evaluation: This involves considering the expected value of each move based on the current position.
  4. Tactical considerations: This involves considering moves that may be able to capture material or threaten the opponent's pieces.

Overall, move ordering is important because it can help the engine to find the best move more efficiently by prioritizing the most promising lines of play. By considering a variety of factors, the engine can search through the different moves in an optimal order, which can save time and resources and lead to stronger play.

What's a killer move?

A killer move is a move that has been successful in the past and is likely to be successful again. In the context of chess engines, killer moves are often used to improve the efficiency of the search by prioritizing moves that have a higher probability of being the best move.

Killer moves are typically used in conjunction with other search techniques, such as alpha-beta pruning and null-move pruning, to further reduce the number of positions that need to be searched. They can be particularly effective in complex positions where the search tree is large and deep.

There are a few different ways that killer moves can be implemented in chess engines. For example, the engine may keep track of a "killer list" for each ply (depth) in the search tree, which contains a set of moves that have been successful in the past. When the engine is searching at that ply, it will prioritize the killer moves and search them before other moves.

Overall, killer moves are a useful technique that chess engines use to find the best move more efficiently by prioritizing moves that have a higher probability of being successful.

Can I get an example?

Certainly! Here's an example of how killer moves might be used in a chess engine:

Imagine that the engine is searching a position and has reached a depth of 4 ply. At this point, it has generated the following list of moves:

  1. e4
  2. d3
  3. Nc3
  4. Bd2
  5. Qh5

Using killer moves, the engine might prioritize the moves as follows:

  1. Qh5 (killer move)
  2. e4 (killer move)
  3. d3
  4. Nc3
  5. Bd2

The engine would then search the moves in this order, starting with Qh5. If Qh5 proves to be a good move, the engine will continue searching the remaining moves in order. If Qh5 is not a good move, the engine will move on to e4 and so on.

By prioritizing killer moves, the engine can save time and resources by focusing on the most promising lines of play first. This can help the engine to find the best move more efficiently and lead to stronger play.

I hope that helps! Let me know if you have any other questions.

What's tree explosion, what causes it, and how do you prevent it from happening?

Tree explosion is a phenomenon that can occur in chess engines when the search tree becomes too large to be efficiently searched. This can happen when the engine is searching deep into the game and there are a large number of possible moves to consider.

There are a few factors that can contribute to tree explosion, including:

  1. Complex positions: Positions with a lot of tactical and strategic possibilities can generate a large search tree, which can be difficult for the engine to explore efficiently.
  2. Insufficient pruning: If the engine is not using pruning techniques effectively, it may end up searching a larger portion of the search tree than is necessary, which can lead to tree explosion.
  3. Poor move ordering: If the engine is not searching the most promising moves first, it may end up searching a larger portion of the search tree than is necessary, which can also contribute to tree explosion.

To prevent tree explosion, chess engines use a variety of techniques, including:

  1. Pruning techniques: Techniques such as alpha-beta pruning and null-move pruning can help the engine to reduce the number of positions that need to be searched, which can prevent tree explosion.
  2. Move ordering: By prioritizing the most promising moves first, the engine can avoid searching large portions of the search tree that are unlikely to be helpful.
  3. Specialized search algorithms: Some chess engines use specialized search algorithms, such as Aspiration Search or Iterative Deepening, which can help to reduce the risk of tree explosion.

Overall, tree explosion is a phenomenon that can occur in chess engines when the search tree becomes too large to be efficiently searched. To prevent tree explosion, chess engines use a variety of techniques, including pruning, move ordering, and specialized search algorithms.

Ok, briefly, what are aspiration search and iterative deepening?

Aspiration search and iterative deepening are two techniques that are commonly used in chess engines to improve the efficiency of the search.

Aspiration search is a search technique that is used to fine-tune the search in positions where the evaluation is likely to be close to the alpha-beta bounds. It works by starting the search with a "window" around the expected evaluation, and then expanding the window if the search fails to find a move within the bounds. By using aspiration search, the engine can avoid searching large portions of the search tree that are unlikely to be helpful and focus on the most promising lines of play.

Iterative deepening is a search technique that involves gradually increasing the depth of the search until a certain time limit is reached. The engine starts by searching the shallowest positions first and gradually works its way deeper into the search tree. This can be an effective technique for finding the best move, since it allows the engine to search a wider range of positions without running out of time.

Overall, aspiration search and iterative deepening are two techniques that are commonly used in chess engines to improve the efficiency of the search and find the best move more quickly.

What's the difference between MCTS and Alpha-Beta search?

Monte Carlo Tree Search (MCTS) and Alpha-Beta search are two different algorithms that are commonly used in chess engines to explore the game tree and find the best move.

One major difference between MCTS and Alpha-Beta search is the way that they explore the search tree. Alpha-Beta search uses a traditional "minimax" approach, in which it considers all possible moves and then selects the move that leads to the best position for the player and the worst position for the opponent. MCTS, on the other hand, uses a more probabilistic approach, in which it simulates the outcome of different moves by "rolling out" the game from the current position and selecting the move that leads to the most favorable outcomes.

Another difference between MCTS and Alpha-Beta search is the way that they handle transpositions (repeated positions). Alpha-Beta search uses a transposition table to store information about positions that have been previously evaluated, which can help the engine to avoid re-analyzing them. MCTS, on the other hand, does not use a transposition table and instead relies on the statistics of the simulations to guide the search.

Overall, while MCTS and Alpha-Beta search are both commonly used in chess engines, they are based on different approaches to exploring the search tree and handle transpositions in different ways.

What doesn't MCTS use a TT?

Monte Carlo Tree Search (MCTS) is a search algorithm that is commonly used in chess engines and other types of artificial intelligence. MCTS does not use a transposition table (TT), which is a data structure that is used to store information about positions that have been previously evaluated.

One reason why MCTS does not use a transposition table is that it is based on a different approach to exploring the search tree than traditional search algorithms like alpha-beta search. MCTS uses a probabilistic approach, in which it simulates the outcome of different moves by "rolling out" the game from the current position and selecting the move that leads to the most favorable outcomes. Because MCTS relies on the statistics of the simulations to guide the search, it does not need to store information about previous positions in a transposition table.

Another reason why MCTS does not use a transposition table is that it can be computationally expensive to maintain and update the transposition table as the search progresses. This can be particularly challenging in complex positions where the search tree is large and deep. By not using a transposition table, MCTS can avoid the overhead of maintaining this data structure and focus on exploring the search tree more efficiently.

Overall, while transposition tables are a useful tool for traditional search algorithms like alpha-beta search, MCTS does not use a transposition table due to its different approach to exploring the search tree and the overhead of maintaining the data structure.

And the cracks begin to show...

In the context of chess engines, what's a pseudo-legal move?

In the context of chess engines, a pseudo-legal move is a move that is legal according to the rules of chess, except for the requirement that it cannot leave the king in check. Pseudo-legal moves are often used during the search process to generate a list of possible moves for a given position, since they can be considered more quickly than legal moves.

Pseudo-legal moves can be useful for chess engines because they allow the engine to generate a larger number of moves for a given position, which can help to find the best move more efficiently. However, it is important to note that not all pseudo-legal moves are legal moves, and the engine will need to check each pseudo-legal move to determine whether it is actually a legal move.

Overall, pseudo-legal moves are a useful concept in the context of chess engines, as they allow the engine to generate a larger number of moves for a given position and find the best move more efficiently.

So, would the set of pseudo-legal moves consider obstructing pieces, leaving the king in check, diagonal pawn moves, etc? I'm confused on the difference between pseudo-legal and strictly legal moves. What's in the pseudo-legal set that isn't legal?

A pseudo-legal move is a move that is legal according to the rules of chess, except for the requirement that it cannot leave the king in check. This means that a pseudo-legal move must consider obstructing pieces, as well as other rules of chess such as pawn promotion and en passant capture. However, it does not need to consider whether the move would put the player's own king in check.

Here are some examples of what is included in the set of pseudo-legal moves:

  • Moving a piece to a square that is occupied by an opponent's piece (capturing the piece)
  • Moving a pawn to a square that is occupied by an opponent's pawn that has just advanced two squares from its starting position (en passant capture)
  • Moving a pawn to the eighth rank and promoting it to a queen, rook, bishop, or knight
  • Castling, as long as the king and rook have not moved, the squares between the king and the rook are unoccupied, and the king is not in check

Here is an example of what is NOT included in the set of pseudo-legal moves:

  • Moving a piece that would put the player's own king in check

Overall, the set of pseudo-legal moves includes most of the legal moves in chess, but it does not include moves that would put the player's own king in check.

In your last response, there appears to be ambiguity with the use of the word "included". At times, it appears you're saying the move is "included" in the set of pseudo-legal moves, and at other times it appears you mean "included" as in "considered in the pseudo-legality test". In particular, the last bullet point seems to contradict the summary.

I apologize for the confusion. You are correct that there is ambiguity in my use of the word "included" in my previous response. To clarify:

  • The first three bullet points describe moves that are included in the set of pseudo-legal moves. These are moves that are legal according to the rules of chess, except for the requirement that they cannot leave the king in check.
  • The last bullet point describes a move that is NOT included in the set of pseudo-legal moves. This is because moving a piece that would put the player's own king in check is not allowed in chess, and therefore it is not considered a pseudo-legal move.

I apologize for any confusion that my previous response may have caused. I hope this clears up any misunderstanding. Please let me know if you have any other questions.

That's actually incorrect. In the context of chess engines, pseudo-legal moves include moves that leave the player's own king in check. The primary reason for this is because calculating whether the position leaves the player's king in check is computationally expensive. The strict legality test is deferred mainly to take advantage of TT's and other move hash tables, which will be checked anyway and imply move legality.

Thank you for bringing this to my attention. You are correct that in the context of chess engines, pseudo-legal moves do include moves that leave the player's own king in check. I apologize for the error in my previous response.

In chess engines, pseudo-legal moves are used to generate a list of possible moves for a given position. The primary reason for using pseudo-legal moves is to save time and resources by avoiding the need to perform a full legality check for each move. This is particularly important in complex positions where the search tree is large and deep, as it can be computationally expensive to calculate whether each move is legal.

By using pseudo-legal moves, chess engines can generate a larger number of moves for a given position more quickly, which can help to find the best move more efficiently. However, it is important to note that not all pseudo-legal moves are legal moves, and the engine will need to check each pseudo-legal move to determine whether it is actually a legal move.

I apologize for the misunderstanding in my previous response. I hope this clears up any confusion. Please let me know if you have any other questions.

When training NNUE, is there general consensus on whether the network should be trained on quiet positions, complicated positions, or both?

In the context of training a neural network for evaluation (NNUE), it is generally recommended to train the network on a wide range of positions in order to improve its generalization ability. This may include quiet positions, complicated positions, and positions with a variety of other characteristics.

Training the network on a diverse range of positions can help to ensure that it is able to handle a wide variety of positions effectively, rather than just being specialized for one type of position. It can also help the network to learn more general patterns and features that are relevant to chess, rather than just memorizing specific positions.

That being said, it is important to carefully balance the number and type of positions that are used for training in order to ensure that the network is able to learn effectively. It may be helpful to include a mix of quiet positions, complicated positions, and positions with other characteristics in the training set, as well as positions from different phases of the game (e.g. opening, middlegame, endgame).

Overall, while there is no one-size-fits-all approach to training NNUE, it is generally recommended to train the network on a diverse range of positions in order to improve its generalization ability.

That is far too many words, my friend. Can you summarize the previous response?

Certainly, here is a summary of my previous response:

When training a neural network for evaluation (NNUE), it is generally recommended to use a diverse range of positions in order to improve the network's generalization ability. This may include quiet positions, complicated positions, and positions with other characteristics, as well as positions from different phases of the game. It is important to carefully balance the number and type of positions used for training in order to ensure that the network is able to learn effectively.

Are you sure? I thought Stockfish only trained the network on quiet positions, preferring to allow the Alpha-Beta search to handle tactical positions. Can you see which of us are correct?

I apologize for the misunderstanding. You are correct that the Stockfish chess engine uses a neural network for evaluation (NNUE) that is trained only on quiet positions.

In the Stockfish chess engine, the NNUE is used to evaluate positions that are considered "quiet" by the alpha-beta search. These are positions where there are no more tactical possibilities and the evaluation can be more reliable. The NNUE is trained on a large dataset of quiet positions in order to learn patterns and features that are relevant to chess evaluation.

The alpha-beta search, on the other hand, is responsible for handling tactical positions and finding the best move in such positions. The alpha-beta search uses a variety of techniques, including pruning, move ordering, and specialized search algorithms, to explore the search tree efficiently and find the best move.

Overall, the Stockfish chess engine uses a combination of alpha-beta search and NNUE to find the best move in a given position. The alpha-beta search handles tactical positions, while the NNUE is used to evaluate quiet positions. I apologize for the misunderstanding in my previous response.

Can you describe the network architecture of NNUE?

The network architecture of a neural network for evaluation (NNUE) is the structure of the network, including the number and types of layers, the number of units or neurons in each layer, and the connections between the layers. The architecture of a NNUE can have a significant impact on its performance and effectiveness.

There are a wide variety of possible network architectures that could be used for a NNUE, and the specific architecture that is best suited for a particular application will depend on a variety of factors, such as the size of the training dataset, the complexity of the positions being evaluated, and the computational resources available.

One common architecture for a NNUE is a fully connected feedforward neural network, which consists of multiple layers of neurons that are fully connected to each other. The input layer of the network receives the raw board position as input, and the output layer produces a score for the position. The hidden layers of the network may contain a variable number of neurons and are used to learn complex relationships and patterns in the data.

Other types of network architectures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), may also be used for NNUE, depending on the specific requirements of the application.

Overall, the architecture of a NNUE is an important factor that can impact its performance and effectiveness, and there are many different architectures that could be used depending on the specific needs of the application.

Can you be more specific about hidden layers, activation functions, dropout, and other features of the NNUE architecure?

Certainly, here is more information about hidden layers, activation functions, dropout, and other features of neural network for evaluation (NNUE) architecture:

  • Hidden layers: Hidden layers are layers of neurons in a neural network that are located between the input layer and the output layer. They are called "hidden" because they are not directly connected to the input or output layers. The purpose of hidden layers is to learn complex relationships and patterns in the data, which can be used to make more accurate predictions or evaluations.
  • Activation functions: Activation functions are mathematical functions that are used to introduce nonlinearity into the neural network. They are typically applied to the output of each neuron in the network, and they determine the output of the neuron based on the input. Activation functions can help the network to learn more complex patterns and features in the data, and different activation functions can be used to achieve different results.
  • Dropout: Dropout is a regularization technique that is used to prevent overfitting in neural networks. It involves randomly setting a fraction of the neurons in the network to zero during training, which can help to reduce the complexity of the model and prevent it from memorizing the training data. Dropout can be applied to any layer of the network, including the input layer, hidden layers, and output layer.

Other features of NNUE architecture may include:

  • Batch normalization: Batch normalization is a technique that is used to improve the stability and convergence of a neural network. It involves normalizing the activations of the neurons in a layer across a batch of input data, which can help to reduce the internal covariate shift and improve the performance of the network.
  • Early stopping: Early stopping is a technique that is used to prevent overfitting in neural networks. It involves monitoring the performance of the network on a validation set during training and stopping the training process when the performance starts to degrade. This can help to prevent the network from overfitting to the training data and improve its generalization ability.

Overall, there are many different features of NNUE architecture that

Okay, yes. You described a large class of feed-forward networks in general, but I'm asking specifically about NNUE used by chess engines. How many layers do they use, for example?

I apologize for the misunderstanding. You are correct that the specific architecture of a neural network for evaluation (NNUE) used by a chess engine may differ from the general description of feedforward neural networks that I provided in my previous response.

NNUE architectures used by chess engines are typically designed to be efficient and compact, in order to minimize the computational resources required for evaluation. The number of layers and the number of units in each layer may vary depending on the specific requirements of the engine, but NNUE architectures used by chess engines tend to be relatively shallow (i.e., with fewer layers) compared to other types of neural networks.

One example of an NNUE architecture used by a chess engine is the "compact neural network" used by the Lc0 chess engine. This network has a total of 5 layers, including an input layer, 3 hidden layers, and an output layer. The input layer receives the raw board position as input, and the output layer produces a score for the position. The hidden layers contain a small number of units (typically fewer than 100) and are used to learn complex relationships and patterns in the data.

Other chess engines may use different NNUE architectures, with a different number of layers and units. The specific architecture that is best suited for a particular engine will depend on a variety of factors, such as the size of the training dataset, the complexity of the positions being evaluated, and the computational resources available.

Overall, NNUE architectures used by chess engines tend to be relatively shallow and compact in order to minimize the computational resources required for evaluation. The specific architecture of an NNUE may vary depending on the specific requirements of the engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment