-
-
Save gwicke/df8b347058a19e6556f6 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
[15:33] gwicke: hey
[15:38] ebernhardson: hey, I was just looking into php kafka producer options, and ottomata pointed me to https://github.com/wikimedia/mediawiki-vendor/tree/master/nmred/kafka-php
[15:39] he mentioned that it skips the zookeeper stuff, so I was wondering what the implications of that are
[15:39] gwicke: yes, that's what we are currently using in production. i stripped out the zookeeper part because it was only using zookeeper to get a list of brokers that were active for a partition
[15:40] gwicke: kafka added an api recently to get that info directly from kafka and skip zookeeper
[15:40] gwicke: so, in short there should be no downside it's just getting the data direct from kafka instead of from zookeeper. I didn't look into the kafka side of things but i'm imagining kafka probably query's zooekeeper for you
[15:40] okay, so it'll still handle master fail-over etc?
[15:40] gwicke: yes
[15:41] I see
[15:41] and this is faster than talking to ZK?
[15:41] gwicke: not sure about faster, but there were no good php level libraries for talking to zookeeper, we would have had to port a C level php module to hhvm
[15:41] weiboad/kafka-php#17
[15:42] it seemed that the info is queried per request
[15:43] gwicke: at least in the code i wrote it is cached inside the php process, but not across processes: https://github.com/nmred/kafka-php/blob/master/src/Kafka/MetaDataFromKafka.php#L120
[15:43] so this is per PHP web request?
[15:43] or across requests?
[15:43] gwicke: yes
[15:44] gwicke: per php request
[15:44] kk
[15:44] gwicke: in prod this is done after closing the request to the user (register_psp_function) so there is no user visible latency
[15:44] we are targeting fairly low volume stuff in any case (edit events), so it's probably fine
[15:44] well, it happens that way indirectly by using the 'buffer' flag in monolog on the channel, which pushes into DeferredUpdates, which uses register_psp_function
[15:45] so for edit events, you would want to do similar with DeferredUpdates most likely
[15:45] yeah, accumulate & then flush in a defferedupdate
[15:46] thanks, sounds like we have one more option for getting those events into kafka
[15:46] excellent, np
[15:46] i'd also like to use these events for making a more robust update process for elasticsearch, but that's probably a bit far off in the horizon :)
[15:46] we are trying hard to get this out before Christmas
[15:47] wish us luck ;)
[15:47] schemas are under discussion at wikimedia/restevent#5