Skip to content

Instantly share code, notes, and snippets.

@gwicke
Last active November 13, 2015 23:48
Show Gist options
  • Save gwicke/df8b347058a19e6556f6 to your computer and use it in GitHub Desktop.
Save gwicke/df8b347058a19e6556f6 to your computer and use it in GitHub Desktop.
@gwicke
Copy link
Author

gwicke commented Nov 13, 2015

[15:33] gwicke: hey
[15:38] ebernhardson: hey, I was just looking into php kafka producer options, and ottomata pointed me to https://github.com/wikimedia/mediawiki-vendor/tree/master/nmred/kafka-php
[15:39] he mentioned that it skips the zookeeper stuff, so I was wondering what the implications of that are
[15:39] gwicke: yes, that's what we are currently using in production. i stripped out the zookeeper part because it was only using zookeeper to get a list of brokers that were active for a partition
[15:40] gwicke: kafka added an api recently to get that info directly from kafka and skip zookeeper
[15:40] gwicke: so, in short there should be no downside it's just getting the data direct from kafka instead of from zookeeper. I didn't look into the kafka side of things but i'm imagining kafka probably query's zooekeeper for you
[15:40] okay, so it'll still handle master fail-over etc?
[15:40] gwicke: yes
[15:41] I see
[15:41] and this is faster than talking to ZK?
[15:41] gwicke: not sure about faster, but there were no good php level libraries for talking to zookeeper, we would have had to port a C level php module to hhvm
[15:41] weiboad/kafka-php#17
[15:42] it seemed that the info is queried per request
[15:43] gwicke: at least in the code i wrote it is cached inside the php process, but not across processes: https://github.com/nmred/kafka-php/blob/master/src/Kafka/MetaDataFromKafka.php#L120
[15:43] so this is per PHP web request?
[15:43] or across requests?
[15:43] gwicke: yes
[15:44] gwicke: per php request
[15:44] kk
[15:44] gwicke: in prod this is done after closing the request to the user (register_psp_function) so there is no user visible latency
[15:44] we are targeting fairly low volume stuff in any case (edit events), so it's probably fine
[15:44] well, it happens that way indirectly by using the 'buffer' flag in monolog on the channel, which pushes into DeferredUpdates, which uses register_psp_function
[15:45] so for edit events, you would want to do similar with DeferredUpdates most likely
[15:45] yeah, accumulate & then flush in a defferedupdate
[15:46] thanks, sounds like we have one more option for getting those events into kafka
[15:46] excellent, np
[15:46] i'd also like to use these events for making a more robust update process for elasticsearch, but that's probably a bit far off in the horizon :)
[15:46] we are trying hard to get this out before Christmas
[15:47] wish us luck ;)
[15:47] schemas are under discussion at wikimedia/restevent#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment