Created
April 26, 2024 05:32
-
-
Save dsmiley/876f37089778d7d8abb49ef6121b4e1a to your computer and use it in GitHub Desktop.
Lucene/Solr: Parse contributors from our CHANGES.txt for easy sharing / tabulation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
from collections import defaultdict | |
data = """ | |
================== 9.6.0 ================== | |
New Features | |
--------------------- | |
* SOLR-17141: Implement 'cpuAllowed' query parameter to limit the maximum CPU usage by a running query. (Andrzej Bialecki, Gus Heck, David Smiley) | |
* SOLR-599: Add a new SolrJ client using the JDK’s built-in Http Client. (James Dyer) | |
* SOLR-16403: A new cluster singleton plugin to automatically remove inactive shards. (Paul McArthur, David Smiley) | |
* SOLR-16466: Admin UI - Make it optional to sort list of commandline args (Shawn Heisey, Vincenzo D'Amore via Christine Poerschke) | |
etc. | |
""" | |
# Initialize a default dictionary to store contributors and their counts | |
contributors = defaultdict(int) | |
# Remove newlines and split the data into lines | |
lines = data.replace('\n', ' ').split('*') | |
# Regular expression to find contributors | |
pattern = re.compile(r'\((.*?)\)') | |
for line in lines: | |
# Find all contributors in the line | |
matches = pattern.findall(line) | |
if matches: | |
for match in matches: | |
# might have a "via" committer; we only want the author here | |
match = match.split(" via ")[0] # keep left side | |
# Split the contributors by comma and strip whitespace | |
for contributor in match.split(','): | |
contributor = contributor.strip() | |
contributors[contributor] += 1 | |
# Print the contributors and their counts | |
for contributor, count in sorted(contributors.items(), key=lambda item: item[1], reverse=True): | |
print(f'{contributor}: {count}') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Tested the script, works well. Guess there's a risk of spelling variations for names etc that will pop up on larger data sets.