Source code readability and comprehensibility have gained increased interest in the late years, due to the wide adoption of component-based software development and the (re)use of software residing in code hosting platforms. Among the various approaches proposed, consistent code styling and code formatting across a project have been proven to significantly improve both readability and the capability of the developers to understand the context, the functionality and the purpose of a block of code. Most code formatting approaches rely on a set of rules defined by experts that aspire to model a commonly accepted formatting. This approach is usually based on the expert's expertise and best practice knowledge, is time consuming and does not take into account the way a team develops software. Thus, it becomes too intrusive and, in many cases, is not adopted. In this work, we present an automated mechanism,
In the realm of software development, maintaining high software quality is a persistent challenge, often impeded by the lack of comprehensive understanding of how specific code modifications influence quality metrics. This study ventures to bridge this gap through an innovative approach that aspires to assess and interpret the impact of code modifications.
The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters. Further, these clusters can be effectively described using an AI language model, thus providing a nuanced understanding of code changes and their quality implications.
To validate this hypothesis, we analyzed a substantial dataset from popular GitHub repositories, segmented into individual code modifications. Each was evaluated based on software quality metrics pre and post-application. Machine learning techniques were utili