Created
November 30, 2010 15:58
-
-
Save vsajip/721870 to your computer and use it in GitHub Desktop.
Some thoughts about Celery and logging
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some thoughts about Celery and logging | |
====================================== | |
LoggerAdapter | |
------------- | |
It appears that the use of LoggerAdapter is to get the processName into the | |
LogRecord (at least, that's what _CompatLoggerAdapter does) but if | |
LoggerAdapter is defined in logging, _CompatLoggerAdapter won't be used. | |
processName | |
----------- | |
There are three places in the code where the processName is written to a | |
LogRecord, some of which can lead to unexpected behaviour in some scenarios: | |
1. ColorFormatter. What happens if you don't use a ColorFormatter? | |
2. _CompatLoggerAdapter. This works, but only because of the coincidence that | |
LoggerAdapter and multiprocessing both appeared in Python 2.6. So in pre- | |
2.6 versions, _CompatLoggerAdapter does the job; in 2.6 and later, logging | |
itself will update the LogRecord with processName. | |
3. ProcessAwareLogger. This can fail if a user does a setLoggerClass with | |
their own Logger subclass. | |
A possible solution is to replace the above mechanisms with a filter, e.g. | |
class ProcessNameFilter(logging.Filter): | |
def __init__(self): | |
import multiprocessing | |
self.processName = multiprocessing.current_process()._name | |
def filter(self, record): | |
record.processName = self.processName | |
return True | |
You only ever need one instance of these per process: | |
pnf = ProcessNameFilter() | |
and, where a logger is returned to a user, you could do for example: | |
if pnf not in returned_logger.filters: | |
returned_logger.addFilter(pnf) | |
return returned_logger | |
This should avoid the need to use a LoggerAdapter altogether. The processName | |
will only be added if the event is actually logged, but that's fine. | |
Problems with stderr writes and infinite loops | |
---------------------------------------------- | |
Logging only ever prints exceptions to sys.stderr where the value of | |
logging.raiseExceptions is true (which it is by default, in order to catch | |
errors during development), or if no handlers are found for a logging call. | |
The latter can be avoided by adding a NullHandler to the top-level Celery | |
logger and also to task loggers where these are not below the top-level | |
Celery logger. (Note that in general, the recommendation is to use __name__ | |
to create module-level loggers, and, where necessary, to create loggers below | |
the module-level logger for more granularity, or occasionally a non-module- | |
named logger for some cross-cutting concern across modules (such as | |
"sql_executed" in a database application). | |
You could recommend that users set logging.raiseExceptions to False, but I | |
accept that (a) in development you could still get into trouble, and (b) | |
users might not do this in production either, and of course you don't want | |
infinite loops! | |
One way round it is to have a LoggingProxy as you do now, but change the way | |
it works. I think the best way of logging with multiprocessing is to use a | |
multiprocessing queue, as described here: | |
http://plumberjack.blogspot.com/2010/09/using-logging-with-multiprocessing.html | |
http://plumberjack.blogspot.com/2010/09/improved-queuehandler-queuelistener.html | |
So, the main Celery process creates a queue and initialises a QueueListener | |
with it, and the queue is also passed to all worker processes. These just have | |
a QueueHandler configured (no other handlers at all) which writes any events | |
logged by tasks to that queue. The main process can take any logging | |
configuration (e.g. via a config file) and log stuff to the appropriately | |
configured destinations, via a QueueListener which is initalised with a | |
handler function something like this: | |
def handle_log_event(record): | |
logger = logging.getLogger(record.name) | |
logger.handle(record) | |
which will automatically make use of the active logging configuration in the | |
main process. | |
Summary: | |
Worker processes and code running in tasks all log to whatever logger | |
they want, but only one handler is provided on the root logger: a QueueHandler | |
which writes all events to the queue. You can attach a ProcessNameFilter to | |
this handler for Pythons, which automatically adds the processName before | |
enqueuing the record. Workers can take account of logging levels which are | |
configured, to avoid generating logging events needlessly. I think there is | |
no need to provide things like filenames, propagation, colorizing settings etc. | |
to task loggers. | |
In the main process, a QueueListener dequeues these records and sends them | |
to the handlers for appropriately named logger. This allows the main Celery | |
process to be set up with a completely custom logging configuration, so that | |
user logs end up wherever they want them to go, but contain correct info about | |
the process in which they ran. You can attach a colorising formatter to a | |
console logger configured here, of course. | |
The above was generated following a relatively brief look through the code so | |
I may have missed something, please let me know if so. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment