I've been working with Apache Kafka for over 7 years. I inevitably find myself doing the same set of activities while I'm developing or working with someone else's system. Here's a set of Kafka productivity hacks for doing a few things way faster than you're probably doing them now. 🔥
# -*- coding: utf-8 -*- | |
""" rwlock.py | |
A class to implement read-write locks on top of the standard threading | |
library. | |
This is implemented with two mutexes (threading.Lock instances) as per this | |
wikipedia pseudocode: | |
https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock#Using_two_mutexes |
Kafka 0.11.0.0 (Confluent 3.3.0) added support to manipulate offsets for a consumer group via cli kafka-consumer-groups
command.
- List the topics to which the group is subscribed
kafka-consumer-groups --bootstrap-server <kafkahost:port> --group <group_id> --describe
Note the values under "CURRENT-OFFSET" and "LOG-END-OFFSET". "CURRENT-OFFSET" is the offset where this consumer group is currently at in each of the partitions.
- Reset the consumer offset for a topic (preview)
I recently happened upon a very interesting implementation of popen()
(different API, same idea) called popen-noshell using clone(2)
, and so I opened an issue requesting use of vfork(2)
or posix_spawn()
for portability. It turns out that on Linux there's an important advantage to using clone(2)
. I think I should capture the things I wrote there in a better place. A gist, a blog, whatever.
This is not a paper. I assume reader familiarity with
fork()
in particular and Unix in general, though, of course, I link to relevant wiki pages, so if the unfamiliar reader is willing to go down the rabbit hole, they should be able to come ou
- Name: Suyash Garg
- IRC Nick: ferbncode
- Email: suyashgargsfam@gmail.com
- Github: https://github.com/ferbncode
- LinkedIn: https://linkedin.com/in/gargsuyash
Presently, CritiqueBrainz uses python-musicbrainzngs to show search results and fetch info of selected entities. python-musicbrainzngs then uses the XML Web Service which returns the requested results. This is not very slow but some pages on CritiqueBrainz require a lot of MusicBrainz data, which takes very long time to retrieve. For example, as suggested in CB-162, if the cache is empty and each user requests separate pages of the review browsing section, then there are 330 X 27
requests to the web service. Directly accessing the MusicBrainz database would mean one query(directly to the database) per page(by getting multiple entities' data in a batch using raw SQL statements) only if the cache is completely empty. Thus, d
package main | |
import ( | |
"fmt" | |
"reflect" | |
) | |
// Name of the struct tag used in examples | |
const tagName = "validate" |
Since many deployments may start out with 3 nodes and so little is known about how to grow a cluster from 3 memebrs to 5 members without losing the existing Quorum, here is an example of how this might be achieved.
In this example, all 5 nodes will be running on the same Vagrant host for the purpose of illustration, running on distinct configurations (ports and data directories) without the actual load of clients.
YMMV. Caveat usufructuarius.
-- show running queries (pre 9.2) | |
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query | |
FROM pg_stat_activity | |
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%' | |
ORDER BY query_start desc; | |
-- show running queries (9.2) | |
SELECT pid, age(clock_timestamp(), query_start), usename, query | |
FROM pg_stat_activity | |
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' |