Skip to content

Instantly share code, notes, and snippets.

@GabiThume
Created May 1, 2013 07:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GabiThume/5494172 to your computer and use it in GitHub Desktop.
Save GabiThume/5494172 to your computer and use it in GitHub Desktop.
<Elrond> paroneayea - Hi back. [13:42]
<paroneayea> hey Elrond !
<Elrond> paroneayea - Reprocessing? [13:44]
<paroneayea> Elrond: yeah, let's talk about it :) [13:45]
ERC> /query joar [13:46]
<Elrond> I think, it's quite easy, IF media-types use the proc_state tools.
[13:47]
<Elrond> proc_state.get_queue_file() -- get from queue for normal, get from
main storage for reprocessing.
<Elrond> proc_state.delete_queue_file() -- delete if the queufile was on the
queue dir, otherwise do nothing. [13:48]
<Elrond> proc_state.store_public("thumb", ...) -- upload new file for main
processing, replace old file for reprocessing.
<Elrond> So for reprocessing, you call the media-types with a subclassed and
changed proc_state and they'll do reprocessing for you. [13:49]
<Elrond> One point: Not all media-types use all of those tools. [13:50]
<Elrond> The thing gets interesting in some details: For example: What if
reprocessing fails, but the original processing did not fail? [13:52]
*** slikdigit_away (~freefac@rose.makesad.us) is now known as slikdigit
[13:53]
<Elrond> And it gets really interesting in the UI perspective: Who should be
allowed to kick off reprocessing under what circumstances? A failed
first processing might likely fail the next time too. But maybe not,
because the admin finally added more disk space or installed the
correct codec. [13:54]
<Elrond> And reprocessing an already processed media usually does not make
much sense, unless paroneayea and Elrond changed the default jpeg
quality. ;) [13:55]
<paroneayea> Elrond: well
<paroneayea> I think that there are two motivations for reprocessing.
<paroneayea> (and they might even be two different tools in a sense...)
*** AVRS (~Aleksej@wikimedia/AVRS) has quit: Quit: leaving
<paroneayea> 1) processing failed! try again!
*** gandaro (~gandaro@wikipedia/Gorlingor) has left channel #mediagoblin:
"Leaving" [13:56]
<Elrond> Right.
<paroneayea> 2) we changed the size of our "medium" size images! or... we
have a new video codec that we support! or... we have
geolocation support now, extract exif data from old media! (yeah
yeah I know) or... we now support *multiple* sizes of media, go
get those! update the relevant old media.
<paroneayea> and in the case of 2, it gets complicated [13:57]
<paroneayea> - surely there may be cases where you update some, but not all
of old media
<paroneayea> - what if the original wasn't kept?
<Elrond> No original --> no reprocessing, right.
<paroneayea> - how do you specify what media to re-process? who does it? a
./bin/gmg user? a user from the website? [13:58]
<paroneayea> I would say for now, only ./bin/gmg type users is enough, but
even there
<paroneayea> how do you "query" for the scope of either
<Elrond> Right. All those "UI" related things make it really interesting.
<paroneayea> - notifying users that there's an opportunity to reprocess
things
<paroneayea> - let users tell what conditions they want things reprocessed
under [13:59]
<paroneayea> - reprocessing specific ones????
<Elrond> Right, right.
<paroneayea> this is one reason I think this is actually a worthwhile GSoC
project :)
<paroneayea> though a hard one
<Elrond> Well, it's hard, because you have to figure out the answers. If you
have the answers, it's quite okay, probably. [14:00]
<paroneayea> right
<paroneayea> one reason why I think this is hard for most gsoc students maybe
<paroneayea> is unlike other ones, this one especially involves thinking about
the fundamental architecture of mediagoblin
<Elrond> Yes. [14:01]
<Elrond> BTW: There's yet another option: "on demand reprocessing". media
might have an .would_like_to_be_reprocessed() and if True, it would
get added to the queue, when/if a user views it. (so that the next
view at least gets a new one) [14:02]
<paroneayea> hm [14:04]
<paroneayea> thta one seems too messy to me
<Elrond> Think of odinho's 20000 images. You don't want those reprocessed all
at once. You want them processed on an as needed basis. [14:05]
<Elrond> He even considered to use reprocess for "initial processing".
<Elrond> ... as in "put original in storage, forget about processing".
<paroneayea> well, I think "consider using reprocess for initial processing"
is relevant in a sense to the implications of what reprocessing
means as a project
<paroneayea> what it *really* means is rethinking the way processing works to
make it more modular
<Elrond> Ahh, yeah. [14:06]
<paroneayea> I'm still surprised how fast and even how far my initial
processing "sketch" got us.
<paroneayea> I had a clear idea of separating out processing, but then I wrote
the image processing code as a "proof of concept", and that proof
of concept grew into reality
<paroneayea> :)
<Elrond> That's why I started this proc_state thing, to make processing more
re-usable.
*** BjarniRunar (~bre@46-239-202-21.tal.is) has quit: Ping timeout: 240
seconds
<paroneayea> I actually think that's also partly because it was a better
design than I thought
<paroneayea> right
<paroneayea> I think a good way of thinking of proc_state [14:07]
<paroneayea> is like a "request object" for processing
<Elrond> Right, it is.
<Elrond> Maybe that's even a better name for it.
<paroneayea> well that might confuse people in its own way, but I think maybe
the docstrings should be changed to reflect it [14:08]
<Elrond> Yes. [14:09]
<Elrond> proc_state has no doc string currently anyway. ;o)
<Elrond> paroneayea - But you're right: This project might be merged with the
pluginification of media-types. Because that one feels a bit "quick
and easy". [14:11]
<paroneayea> right
<paroneayea> hm, yeah
<paroneayea> I think we should update the gsoc page to reflect that.
<paroneayea> I can do so
<Elrond> That would be nice, if you could do that. [14:12]
<Elrond> Possibly paste some of the dashy points from you above. Those are the
real big issues of this project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment