Skip to content

Instantly share code, notes, and snippets.

@ffledgling
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ffledgling/73b59fa8f027213f792c to your computer and use it in GitHub Desktop.
Save ffledgling/73b59fa8f027213f792c to your computer and use it in GitHub Desktop.
Senbonzakura Troubleshooting

Deployment and Troubleshooting

Senbonzakura/Funsize is still a fairly new application and there's probably a lot of kinks in it.
If you're changing things around or deploying it somewhere, it's more than likely sooner or later you'll run into errors and have the application stuck in an un-usable state. This document will list out how to get out of that state.

Important Notes

If you're dealing with this application when it's deployed via docker (locally or on Elastic Beanstalk), you will not have direct access to the container itself, you will probably only have shell access to the host. This means you cannot SSH into the container itself.

To make it easier to extract the cache, database and logs, a folder from the host is mounted within the container. Please take a look at the dockerrun.aws.json file for the exact locations. Typically the relevant host folder is /var/funsize

Normally you will not be able to "stop" different services in the container. Thus the best option when dealing with containers is to shut the them down and/or destroy them.

Nuclear Option

TL;DR

# Stop everything
killall -9 python python2.7 # Kill Flask and celery
kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # kill rabbitmq-server
rm -rf <cache location>/* # Cleanup cache
mysql -u root -e "Truncate partial;" # For MySQL
rm <database file>.db # For SQLite
# Make sure virtualenv is active
<root of repo>/startup.sh

Please look at the "Things to keep in Mind" section below to help with debugging.

It's still suggested you read through at least the rest of this section before copy pasting the commands above, unless of course you're aware of the pitfalls/consequences.

Full

If you don't want to spend time figuring out which bits need to be cleaned out and only want to get the application back in a working state ASAP, do the following:

  1. Stop everything that's running, this means stop:

    1. Stop flask (should be running as api.py)
    2. Stop celery
    3. Stop the rabbitmq-server

    A good way to do this is:

    killall -9 python python2.7 # This should get rid of python and celery
    # Don't worry if one of python2.7 or python are not found by kill, just confirm no python is running.
    # Confirm with "ps aux | grep celery" and "ps aux | grep api.py"
    
    # Killing rabbitmq is a little trickier
    # The things you need to killed are a "beam.smp" and empd
    kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # Should ideally kill both.
    # Confirm with "ps aux | grep rabbitmq"
    
  2. Clean out the Cache
    You can simply do:

    # The location of the cache is specified in "default.ini" and "worker.ini" inside under senbonzakura/configs/ in the app dir.
    # The application dir in docker is /app/ by default, on your local machine it's wherever you cloned the repo
    # On docker the default cache is /perma/cache
    rm -rf <cache location>/\* # Note: not rm -rf <cache location>, we need that folder to exist
    
  3. Clean out the Database You simply need to delete/empty the table that contains the data

    For a MySQL:

    mysql -u root -e "Truncate partial;"
    

    For SQLite:

    rm <database file>.db
    
  4. Restart everything You can either start everything manually by hand, or use one of the existing scripts to start things up.
    You need to have the virtualenv which contains the repository activated before anything else.

    If you're inside docker just run the ./docker_init.sh script. If you're on your own machine run ./startup.sh. Both these are inside repository at the top level

    If you want to restart things manually by hand, you can use multiple tabs/panes/terminals or use &. You need to run the following 3 commands essentially.

    The following instructions assume you're in the root of the repository.

    rabbitmq-server # add -detached if you want to daemonize instead
    
    celery worker -A senbonzakura.backend.tasks -l INFO
    # Use -f <log file location> for logging to file, --detach to run as a daemon
    
    python senbonzakura/frontend/api.py
    

Other Methods

Crucial Bits

Essentially the only things that keep any sort of state are:

  1. The Database
  2. The Cache

Database

The database maintains state of the partial requests, so if the database gets corrupted, or if a partial generation aborts, then the database will prevent you from re-triggering the request.

The best, non-nuclear way to clean this is up is to stop the service and cleanup the database. To do this, stop the running services so that no new entries are added while you're editing the database.

Next find all entries in the database that have status field set to a non-zero value. You should be able to do this like so:

delete from partial where status!=0;

Cache

The cache implictly maintains some state of the application because if a partial exists in the cache, it means that the partial generation request was completed. Sometimes for whatever reason, there might be a mismatch between the state tracked in database and the one in the cache. (Especially after a Database modification/cleanup/purge and so on).

The best way to resolve this is to go into the cache and remove the offending partial if you know which one it is. If don't know or don't want to know which partial is causing the problem, you can simply delete the partial sub-directory in the cache directory.

Nuking the entire cache directory also works (see Nuclear option above), but the cache directory has cached complete MARs that have been downloaded over the course of time and it's probably a good idea to keep them around unless you have reason to do otherwise.

Things to keep in Mind

If you're working with a deployed version of the application and do not want to debug it, but would still like someone else to be able to do so later, you can do some of the following steps to help:

  1. Shutdown everything
  2. Backup the database in it's current state
    i.e get a dump of the database, and current configuration file being used, eg. my.cnf for MySQL
  3. Make a copy of the cache in the current state.
    Find the location of the cache in the .ini files; It should be /perma/cache by default.
  4. Save a copy of the flask and celery logs as they are.
    The location of the log files is also mentioned in the .ini files (worker.ini and default.ini under senbonzakura/configs)
    Stored in /var/log/celerylog.log or /usr/local/var/log/celerylog.log by default.
  5. ... Anything else?
@hwine
Copy link

hwine commented Jul 22, 2014

re: # Stop everything
Where? in all containers? In just the flask & celery ones?

re: backup database
How? is just doing a dump viable?

re: cache (section under Other Methods)
worth repeating how to find cache (ini files)

re: things to keep in mind
where are the log files?

General questions:

  • do we ever need to drain or recreate the celery queue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment