Skip to content

Instantly share code, notes, and snippets.

@dsmiley
Created April 26, 2024 05:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dsmiley/876f37089778d7d8abb49ef6121b4e1a to your computer and use it in GitHub Desktop.
Save dsmiley/876f37089778d7d8abb49ef6121b4e1a to your computer and use it in GitHub Desktop.
Lucene/Solr: Parse contributors from our CHANGES.txt for easy sharing / tabulation.
import re
from collections import defaultdict
data = """
================== 9.6.0 ==================
New Features
---------------------
* SOLR-17141: Implement 'cpuAllowed' query parameter to limit the maximum CPU usage by a running query. (Andrzej Bialecki, Gus Heck, David Smiley)
* SOLR-599: Add a new SolrJ client using the JDK’s built-in Http Client. (James Dyer)
* SOLR-16403: A new cluster singleton plugin to automatically remove inactive shards. (Paul McArthur, David Smiley)
* SOLR-16466: Admin UI - Make it optional to sort list of commandline args (Shawn Heisey, Vincenzo D'Amore via Christine Poerschke)
etc.
"""
# Initialize a default dictionary to store contributors and their counts
contributors = defaultdict(int)
# Remove newlines and split the data into lines
lines = data.replace('\n', ' ').split('*')
# Regular expression to find contributors
pattern = re.compile(r'\((.*?)\)')
for line in lines:
# Find all contributors in the line
matches = pattern.findall(line)
if matches:
for match in matches:
# might have a "via" committer; we only want the author here
match = match.split(" via ")[0] # keep left side
# Split the contributors by comma and strip whitespace
for contributor in match.split(','):
contributor = contributor.strip()
contributors[contributor] += 1
# Print the contributors and their counts
for contributor, count in sorted(contributors.items(), key=lambda item: item[1], reverse=True):
print(f'{contributor}: {count}')
@janhoy
Copy link

janhoy commented Apr 26, 2024

Tested the script, works well. Guess there's a risk of spelling variations for names etc that will pop up on larger data sets.

@dsmiley
Copy link
Author

dsmiley commented Apr 26, 2024

Replaced by apache/solr#2424 which is improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment