Skip to content

Instantly share code, notes, and snippets.

@ghousemohamed
Last active February 9, 2024 18:26
Show Gist options
  • Save ghousemohamed/343858c4cb7ec0decc14fcb20952b165 to your computer and use it in GitHub Desktop.
Save ghousemohamed/343858c4cb7ec0decc14fcb20952b165 to your computer and use it in GitHub Desktop.
Issues reported since Nov 14th, 2023

The following issues have been reported multiple number of times in #neeto-hq and #neeto-deploy channels

  1. Postgres connection issue:
Caused by:
PG::ConnectionBad: connection to server at "10.100.111.33", port 5432 failed: Connection refused (PG::ConnectionBad)
	Is the server running on that host and accepting TCP/IP connections?

Although the frequency of this error has greatly reduced, the following error crops up now and then. Relevant issue that is still being tracked: https://github.com/bigbinary/neeto-deploy-web/issues/2513

  1. Failure to start build because of corrupt base image:
getting previous image: getting config file for image "10.100.0.20:5000/neeto-deploy/v2/builds:neeto-desk-web-staging": unexpected EOF

This issue will be addressed once we make the switch to Harbor registry. These occurences should be reduced greatly number since we have now allocated more resources to our default registry.

  1. Redis and Postgres addons running out of memory/space:

In case of Postgres we are allocating 1gb of storage by default. Once this space is filled up, the neeto app using this DB crashes. We have a basic mechanism by which alerted before hand via mail when this is going to happen.

And for redis the following happens because of memory overflow. It usually happens with neetoAuth, since neetoAuth uses lot of redis keys. I assume this problem should go away since we are making the switch to solid_(cache/queue)

Screenshot 2024-02-09 at 11 42 17 PM
  1. Incorrectly configured health checks. The root path or the path defined as the custom health check url should return status code in the range 2xx-3xx. Some apps like neeto-og-generator returns 404 for the root path, because of which these were reported.

  2. ImagePullBackOff errors: These will go away once we migrate to Harbor. A temporary fix for this has been already added.

  3. Issue with console not loading in the Web/CLI. This is a very error, no remedy or issue is present for this.


And the below following issues have occurred frequently because of misconfiguration and other mistakes from our end. There are no "fixes" for these issues, we just have to be more careful.

  1. We forgot to turn off maintenance mode for some apps.
  2. Misconfigured the builds stage causing the builds of many apps to be stuck or fail.
  3. Misconfigured the postgres addon backups.
  4. Some apps not getting deployed because of naming issue, (validation rules not covering certain edge cases)
  5. Misconfig of pod-idling and downtime service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment