Skip to content

Instantly share code, notes, and snippets.

@erikdubbelboer
Created February 5, 2025 05:07
Show Gist options
  • Save erikdubbelboer/529ba1ab62e7c9603a84b454be595da9 to your computer and use it in GitHub Desktop.
Save erikdubbelboer/529ba1ab62e7c9603a84b454be595da9 to your computer and use it in GitHub Desktop.
Asking Google Gemini to look at https://github.com/poki/netlib

Okay, this is a large codebase. Here's a breakdown of potential issues and bugs, focusing on areas that stand out:

cmd/signaling/main.go

  • Error Handling: The error handling in main.go is a bit aggressive. logger.Panic will terminate the application, which might not be desirable in a production environment. Consider using logger.Fatal or logging the error and attempting to recover.
  • Random Seed: While the code seeds the random number generator, seeding with time.Now().UnixNano() isn't ideal for security-sensitive applications. If the generated random numbers are used for any kind of security purpose (unlikely in this code, but worth noting), a more secure random source should be used.
  • Potential resource leak: I can't see a close DB connection anywhere. While the server has defer logger.Sync(), I can't see the DB connection to postgres closed correctly. While the cloudflare credentials client does have a defer cancel(), I can't see any defer cancel() for the DB connection in this main.go or in the stores package.

cmd/testproxy/main.go

  • Error Handling: Similar to cmd/signaling/main.go, the panic calls are not ideal for a production-like application.
  • Security Vulnerability in /sql Handler: The /sql handler executes arbitrary SQL queries from the request body. This is a major security vulnerability and should never be exposed outside of a tightly controlled testing environment. There's a comment saying "This process is only ran during tests." but the code doesn't enforce this. A compromised test environment could potentially escalate to a production database breach if misconfigured.
  • Resource Leaks in /create Handler: There are several panic calls, which will not defer conn.Close(). So, a lot of UDP connections may stay open if something goes wrong.
  • Race Condition in /create handler: The interrupts map and the connections map is mutated by multiple HTTP request handlers. This means that multiple HTTP requests may operate and mutate the maps at the same time. This is a race condition, which may cause undefined behavior.
  • Missing input validation on /create handler: the port argument is not validated, so the testproxy may try to listen to a restricted port.
  • Missing checks for io.ReadAll(r.Body) on /sql handler: If an attacker sends a very large request body, this can DoS the server.
  • **/create handler uses addr.Port != raddr.Port as a check to determine who to send the UDP packet to. This assumption is wrong. UDP source ports are often random and can be the same.

internal/cloudflare/credentials.go

  • Race Condition on HasFetchedFirstCredentials: There is a race condition that happens because HasFetchedFirstCredentials is never used and it's only written to.
  • Error Handling: The retry mechanism for fetching credentials could benefit from exponential backoff with jitter for better resilience and to avoid thundering herd problems.
  • Missing check for non-production environments: There is an early return in the Run function that returns if the appID is empty and the environment is not production. However, there is nothing stopping a user from setting ENV=production when running locally, thus bypassing the check.

internal/metrics/client.go

  • Data Validation: The Record function panics if the data slice doesn't have an even length. This should be handled more gracefully (e.g., logging an error and returning).
  • Idempotency Key Generation: The xid library is used for idempotency key generation. While xid is generally good, consider using UUIDs directly for broader compatibility and potentially better collision resistance if needed.
  • Error on metrics.RecordEvent: The logger calls return instead of continue in the retry loop. This will prevent retries and cause the metrics endpoint to be unreliable.
  • Nil dereference on remote: In the /create handler, the remote variable is nil, so a nil-dereference will happen here if addr.Port == remote.Port {

internal/signaling/stores/postgres.go

  • SQL Injection Vulnerability in ListLobbies: The ListLobbies function uses string concatenation to build the SQL query with the where clause. Even though the filterConverter attempts to sanitize the input, there's still a risk of SQL injection if the filter conversion logic has any flaws or if unexpected input makes its way through. Prepared statements should be used instead.
  • Topic Length Validation: The Publish function checks the topic length, but the check is insufficient given that the total length of the message is base64.StdEncoding.EncodedLen(len(data)) + len(topic) + 1 and that length is checked later.
  • "Leader" Column Default Value in Migrations: The 1718348556_leader.up.sql migration adds a "leader" column with a default value of ''. This could be problematic if the application relies on NULL to indicate no leader, and needs to have a default value of NULL instead.
  • Race Condition in Subscribe: The s.nextCallbackIndex is incremented, but the whole function is locked with a mutex, but not all variables within the function.
  • Possible DoS in Publish method: The publish method does a NOTIFY lobbies, '+payload+`'`` which is passed as a string, which is then SQL injected directly. An attacker could exploit this to cause DoS by flooding the connection.
  • Hardcoded value on ListLobbies In the SQL query, limit is hardcoded to 50. There is no way to increase the upper limit.

internal/signaling/handler.go

  • No Origin Validation: The code skips origin validation.
  • Potential DoS via uncontrolled memory growth: There is a risk of uncontrolled memory growth due to unbounded channel buffering or map sizes within the signaling handler.

internal/signaling/peer.go

  • Too permissive password checking: Code does not validate password for special characters.
  • Error on uninitialized lobby or peers: There is a potential risk of logic errors, such as null pointer dereferences, if Lobby or ID are not initialized correctly before use.

lib/*.ts

  • Credentials.ts - Potential infinite loop / race condition: The runningPromise is set to undefined after the await this.runningPromise. This could lead to a race condition if multiple components simultaneously request credentials. One component will set runningPromise to undefined while others are still waiting for the original promise, potentially triggering a new (unnecessary) credential fetch and a loop, in certain circumstances.
  • Network.ts - Error Handling on setLobbySettings: Returns Error object but the callsite has no type checks for handling Error vs true

manifest/*.yaml

  • Missing Resource Limits: The deployment manifest lacks resource limits (CPU, memory). This could lead to resource exhaustion and instability if the application experiences high load.
  • Health Probe: The health probe is missing.

General

  • Consistent Error Handling: The code mixes different error handling strategies (panics, ErrorAndAbort, returning errors). A consistent approach should be adopted throughout the codebase.
  • Logging: There are a lot of logging.Info and logging.Debug messages that get executed in a loop. Consider reducing the amount of logging and adding metrics instead, as logging has overhead.
  • Missing tests: The codebase has very few tests, such as no unit tests for the lib directory.
  • Missing input validation: there is no input sanitization before arguments are used.
  • Testproxy SQL injection still exists: The testproxy can still be SQL injected, which puts the real production Postgres database at risk.

Recommendations

  1. Prioritize Security: Address the SQL injection vulnerability in stores/postgres.go and the arbitrary SQL execution in the testproxy.
  2. Improve Error Handling: Replace panic calls with more graceful error handling and logging.
  3. Resource Management: Add resource limits to the deployment manifest and fix the potential resource leaks.
  4. Thorough Testing: Implement unit tests for the lib directory and integration tests to verify the behavior of different components.
  5. Review and Refactor: Consider refactoring the code to improve readability, maintainability, and overall design.

This analysis highlights potential areas for improvement. A deeper dive and more specific testing are recommended to confirm and fully address these issues.

Okay, this is a large codebase. Here's a breakdown of potential issues and bugs, focusing on areas that stand out:

cmd/signaling/main.go

  • Error Handling: The error handling in main.go is a bit aggressive. logger.Panic will terminate the application, which might not be desirable in a production environment. Consider using logger.Fatal or logging the error and attempting to recover.
  • Random Seed: While the code seeds the random number generator, seeding with time.Now().UnixNano() isn't ideal for security-sensitive applications. If the generated random numbers are used for any kind of security purpose (unlikely in this code, but worth noting), a more secure random source should be used.
  • Potential resource leak: I can't see a close DB connection anywhere. While the server has defer logger.Sync(), I can't see the DB connection to postgres closed correctly. While the cloudflare credentials client does have a defer cancel(), I can't see any defer cancel() for the DB connection in this main.go or in the stores package.

cmd/testproxy/main.go

  • Error Handling: Similar to cmd/signaling/main.go, the panic calls are not ideal for a production-like application.
  • Security Vulnerability in /sql Handler: The /sql handler executes arbitrary SQL queries from the request body. This is a major security vulnerability and should never be exposed outside of a tightly controlled testing environment. There's a comment saying "This process is only ran during tests." but the code doesn't enforce this. A compromised test environment could potentially escalate to a production database breach if misconfigured.
  • Resource Leaks in /create Handler: There are several panic calls, which will not defer conn.Close(). So, a lot of UDP connections may stay open if something goes wrong.
  • Race Condition in /create handler: The interrupts map and the connections map is mutated by multiple HTTP request handlers. This means that multiple HTTP requests may operate and mutate the maps at the same time. This is a race condition, which may cause undefined behavior.
  • Missing input validation on /create handler: the port argument is not validated, so the testproxy may try to listen to a restricted port.
  • Missing checks for io.ReadAll(r.Body) on /sql handler: If an attacker sends a very large request body, this can DoS the server.
  • **/create handler uses addr.Port != raddr.Port as a check to determine who to send the UDP packet to. This assumption is wrong. UDP source ports are often random and can be the same.

internal/cloudflare/credentials.go

  • Race Condition on HasFetchedFirstCredentials: There is a race condition that happens because HasFetchedFirstCredentials is never used and it's only written to.
  • Error Handling: The retry mechanism for fetching credentials could benefit from exponential backoff with jitter for better resilience and to avoid thundering herd problems.
  • Missing check for non-production environments: There is an early return in the Run function that returns if the appID is empty and the environment is not production. However, there is nothing stopping a user from setting ENV=production when running locally, thus bypassing the check.

internal/metrics/client.go

  • Data Validation: The Record function panics if the data slice doesn't have an even length. This should be handled more gracefully (e.g., logging an error and returning).
  • Idempotency Key Generation: The xid library is used for idempotency key generation. While xid is generally good, consider using UUIDs directly for broader compatibility and potentially better collision resistance if needed.
  • Error on metrics.RecordEvent: The logger calls return instead of continue in the retry loop. This will prevent retries and cause the metrics endpoint to be unreliable.
  • Nil dereference on remote: In the /create handler, the remote variable is nil, so a nil-dereference will happen here if addr.Port == remote.Port {

internal/signaling/stores/postgres.go

  • SQL Injection Vulnerability in ListLobbies: The ListLobbies function uses string concatenation to build the SQL query with the where clause. Even though the filterConverter attempts to sanitize the input, there's still a risk of SQL injection if the filter conversion logic has any flaws or if unexpected input makes its way through. Prepared statements should be used instead.
  • Topic Length Validation: The Publish function checks the topic length, but the check is insufficient given that the total length of the message is base64.StdEncoding.EncodedLen(len(data)) + len(topic) + 1 and that length is checked later.
  • "Leader" Column Default Value in Migrations: The 1718348556_leader.up.sql migration adds a "leader" column with a default value of ''. This could be problematic if the application relies on NULL to indicate no leader, and needs to have a default value of NULL instead.
  • Race Condition in Subscribe: The s.nextCallbackIndex is incremented, but the whole function is locked with a mutex, but not all variables within the function.
  • Possible DoS in Publish method: The publish method does a NOTIFY lobbies, '+payload+`'`` which is passed as a string, which is then SQL injected directly. An attacker could exploit this to cause DoS by flooding the connection.
  • Hardcoded value on ListLobbies In the SQL query, limit is hardcoded to 50. There is no way to increase the upper limit.

internal/signaling/handler.go

  • No Origin Validation: The code skips origin validation.
  • Potential DoS via uncontrolled memory growth: There is a risk of uncontrolled memory growth due to unbounded channel buffering or map sizes within the signaling handler.

internal/signaling/peer.go

  • Too permissive password checking: Code does not validate password for special characters.
  • Error on uninitialized lobby or peers: There is a potential risk of logic errors, such as null pointer dereferences, if Lobby or ID are not initialized correctly before use.

lib/*.ts

  • Credentials.ts - Potential infinite loop / race condition: The runningPromise is set to undefined after the await this.runningPromise. This could lead to a race condition if multiple components simultaneously request credentials. One component will set runningPromise to undefined while others are still waiting for the original promise, potentially triggering a new (unnecessary) credential fetch and a loop, in certain circumstances.
  • Network.ts - Error Handling on setLobbySettings: Returns Error object but the callsite has no type checks for handling Error vs true

manifest/*.yaml

  • Missing Resource Limits: The deployment manifest lacks resource limits (CPU, memory). This could lead to resource exhaustion and instability if the application experiences high load.
  • Health Probe: The health probe is missing.

General

  • Consistent Error Handling: The code mixes different error handling strategies (panics, ErrorAndAbort, returning errors). A consistent approach should be adopted throughout the codebase.
  • Logging: There are a lot of logging.Info and logging.Debug messages that get executed in a loop. Consider reducing the amount of logging and adding metrics instead, as logging has overhead.
  • Missing tests: The codebase has very few tests, such as no unit tests for the lib directory.
  • Missing input validation: there is no input sanitization before arguments are used.
  • Testproxy SQL injection still exists: The testproxy can still be SQL injected, which puts the real production Postgres database at risk.

Recommendations

  1. Prioritize Security: Address the SQL injection vulnerability in stores/postgres.go and the arbitrary SQL execution in the testproxy.
  2. Improve Error Handling: Replace panic calls with more graceful error handling and logging.
  3. Resource Management: Add resource limits to the deployment manifest and fix the potential resource leaks.
  4. Thorough Testing: Implement unit tests for the lib directory and integration tests to verify the behavior of different components.
  5. Review and Refactor: Consider refactoring the code to improve readability, maintainability, and overall design.

This analysis highlights potential areas for improvement. A deeper dive and more specific testing are recommended to confirm and fully address these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment