ffledgling/funsize-troubleshoot.md

## funsize-troubleshoot.md

      
    Raw
  

              funsize-troubleshoot.md
            
          
    Deployment and Troubleshooting

Senbonzakura/Funsize is still a fairly new application and there's probably a lot of kinks in it.

If you're changing things around or deploying it somewhere, it's more than likely sooner or later
you'll run into errors and have the application stuck in an un-usable state. This document will list
out how to get out of that state.
Important Notes

If you're dealing with this application when it's deployed via docker (locally or on Elastic Beanstalk),
you will not have direct access to the container itself, you will probably only have shell access to the
host. This means you cannot SSH into the container itself.
To make it easier to extract the cache, database and logs, a folder from the host is mounted within
the container.
Please take a look at the dockerrun.aws.json file for the exact locations.
Typically the relevant host folder is /var/funsize
Normally you will not be able to "stop" different services in the container. Thus the best option
when dealing with containers is to shut the them down and/or destroy them.
Nuclear Option

TL;DR

# Stop everything
killall -9 python python2.7 # Kill Flask and celery
kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # kill rabbitmq-server
rm -rf <cache location>/* # Cleanup cache
mysql -u root -e "Truncate partial;" # For MySQL
rm <database file>.db # For SQLite
# Make sure virtualenv is active
<root of repo>/startup.sh

Please look at the "Things to keep in Mind" section below to help with debugging.
It's still suggested you read through at least the rest of this section before copy pasting the
commands above, unless of course you're aware of the pitfalls/consequences.
Full

If you don't want to spend time figuring out which bits need to be cleaned out and only want to get
the application back in a working state ASAP, do the following:


Stop everything that's running, this means stop:

Stop flask (should be running as api.py)
Stop celery
Stop the rabbitmq-server

A good way to do this is:
killall -9 python python2.7 # This should get rid of python and celery
# Don't worry if one of python2.7 or python are not found by kill, just confirm no python is running.
# Confirm with "ps aux | grep celery" and "ps aux | grep api.py"

# Killing rabbitmq is a little trickier
# The things you need to killed are a "beam.smp" and empd
kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # Should ideally kill both.
# Confirm with "ps aux | grep rabbitmq"


Clean out the Cache

You can simply do:
# The location of the cache is specified in "default.ini" and "worker.ini" inside under senbonzakura/configs/ in the app dir.
# The application dir in docker is /app/ by default, on your local machine it's wherever you cloned the repo
# On docker the default cache is /perma/cache
rm -rf <cache location>/\* # Note: not rm -rf <cache location>, we need that folder to exist


Clean out the Database
You simply need to delete/empty the table that contains the data
For a MySQL:
mysql -u root -e "Truncate partial;"

For SQLite:
rm <database file>.db


Restart everything
You can either start everything manually by hand, or use one of the existing scripts to start
things up.

You need to have the virtualenv which contains the repository activated before anything else.
If you're inside docker just run the ./docker_init.sh script.
If you're on your own machine run ./startup.sh.
Both these are inside repository at the top level
If you want to restart things manually by hand, you can use multiple tabs/panes/terminals or use
&. You need to run the following 3 commands essentially.
The following instructions assume you're in the root of the repository.
rabbitmq-server # add -detached if you want to daemonize instead

celery worker -A senbonzakura.backend.tasks -l INFO
# Use -f <log file location> for logging to file, --detach to run as a daemon

python senbonzakura/frontend/api.py


Other Methods

Crucial Bits

Essentially the only things that keep any sort of state are:

The Database
The Cache

Database

The database maintains state of the partial requests, so if the database gets corrupted, or if a
partial generation aborts, then the database will prevent you from re-triggering the request.
The best, non-nuclear way to clean this is up is to stop the service and cleanup the database.
To do this, stop the running services so that no new entries are added while you're editing the
database.
Next find all entries in the database that have status field set to a non-zero value.
You should be able to do this like so:
delete from partial where status!=0;

Cache

The cache implictly maintains some state of the application because if a partial exists in the
cache, it means that the partial generation request was completed. Sometimes for whatever reason,
there might be a mismatch between the state tracked in database and the one in the cache.
(Especially after a Database modification/cleanup/purge and so on).
The best way to resolve this is to go into the cache and remove the offending partial if you know
which one it is. If don't know or don't want to know which partial is causing the problem, you can
simply delete the partial sub-directory in the cache directory.
Nuking the entire cache directory also works (see Nuclear option above), but the cache directory has
cached complete MARs that have been downloaded over the course of time and it's probably a good idea
to keep them around unless you have reason to do otherwise.
Things to keep in Mind

If you're working with a deployed version of the application and do not want to debug it, but would
still like someone else to be able to do so later, you can do some of the following steps to help:

Shutdown everything
Backup the database in it's current state

i.e get a dump of the database, and current configuration file being used, eg. my.cnf for MySQL
Make a copy of the cache in the current state.

Find the location of the cache in the .ini files; It should be /perma/cache by default.
Save a copy of the flask and celery logs as they are.

The location of the log files is also mentioned in the .ini files (worker.ini and default.ini under senbonzakura/configs)

Stored in /var/log/celerylog.log or /usr/local/var/log/celerylog.log by default.
... Anything else?