Skip to content

Instantly share code, notes, and snippets.

@yegor256
Last active April 21, 2024 07:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yegor256/3dde3560d26ba1d7b1d2a91dbe118a12 to your computer and use it in GitHub Desktop.
Save yegor256/3dde3560d26ba1d7b1d2a91dbe118a12 to your computer and use it in GitHub Desktop.
SQM: Research Questions

The following research questions are for the SQM course students:

Q1: In Java, there are object methods and class methods (also known as "static" methods). We suspect that the presence of static methods in Java classes negatively affects the quality of code. To validate this intuition, we analysed 100K+ Java files from 100+ open-source GitHub projects. We measured the number of static methods in them and their size in Lines of Code, keeping in mind that larger source code files have higher Cyclomatic Complexity (CC). Higher CC means lower maintainability. Then, we summarised the results obtained.

Q2: In Java, some object attributes are mutable, while others are immutable (with the "final" modifier attached to them). Objects that have at least one mutable attribute may be referred to as "mutable" objects. We suspect that mutable objects have higher Cyclomatic Complexity (CC) than immutable ones. To validate this intuition, we analysed 100K+ Java files from 100+ open-source projects. We calculated the relationship between the number of mutable attributes in a Java class and the average CC of its methods. Then, we summarised the results obtained. (We already wrote a paper about this, now trying to publish it!)

Q3: Common wisdom in software development claims that functions and methods with a fewer number of parameters are more preferable for a number of reasons. We suspect that functions with more parameters are larger than functions with fewer parameters. To validate this intuition, we analysed 100K+ Java classes from 100+ open-source GitHub projects and evaluated the relation between the number of parameters and the Cyclomatic Complexity of methods. Then, we summarised the results obtained.

Q4: Using NULL in object-oriented programming is considered a bad practice. However, it has never been demonstrated how the presence of NULL impacts the complexity of the code, thus affecting its size and quality (it actually was). We suspect that Java classes where NULL is used more frequently are larger than those where NULL is used less intensively. To validate this intuition, we analysed 100K+ Java classes from 100+ public GitHub repositories, measuring the amount of NULL references and the Cyclomatic Complexity of methods. Then, we summarised the results obtained. (We already wrote a paper about this, now trying to publish it!)

Q5: Some practitioners believe that classes whose names have the "-ER" suffix are bad design decisions. Such classes are less cohesive than other classes. Utility classes, which are suffixed as "-Utils," also belong to the same category of bad design decisions due to inevitably lower cohesion. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, measuring the cohesion of classes and the presence of naming patterns. Then, we summarized the results obtained.
Published on arXiv.

Q6: Compound names of variables are design smell, according to some practitioners. On the other hand, other software experts believe that longer names contribute to higher quality of code. In order to find out where is the truth, we analyzed 100K+ Java classes from 100+ public GitHub repositories, measuring cyclomatic complexity of methods and the average length of its variables and parameters. Then, we summarized the results obtained. (We already wrote a paper about this, now trying to publish it!)

Q7: Some object-oriented programming languages, like Java and C++, offer a method overloading feature that enables the declaration of more than one method in a class with the same name but different types of parameters. Some practitioners believe that classes where methods or constructors are overloaded exhibit flawed design, especially leading to larger and more complex code. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, comparing the presence of overloaded methods with the size and complexity of classes. Then, we summarized the results obtained.

Q8: In Java, most classes depend on other classes, either coming from the Java SDK or external libraries. Intuition tells us that the more dependencies a class has, the lower its maintainability is, which directly and negatively impacts the quality of code. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, comparing the number of import instructions with the size of the code and its complexity. Then, we summarized the results obtained.

Q9: It may be reasonable to assume that smaller classes, which have fewer methods, are less complex. In other words, the Cyclomatic Complexity of methods is lower if there are fewer methods in the class. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, comparing the number of methods in a class and the complexity of said methods. Then, we summarized the results obtained. (We already wrote a paper about this, now trying to publish it!)

Q10: It may be obvious that open-source repositories with higher quality of code attract more developers and receive more contributions from them. Programmers may be interested in putting their efforts where they see code that is simpler and less complex. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, comparing the complexity and size of the code with GitHub metrics, like the number of forks, commits, and pull requests. Then, we summarized the results obtained.

Q11: There is a belief among some software practitioners that class methods should not contain empty lines inside their bodies, because such empty lines are indicators of complexity gone out of control. If an empty line is required to split a method into two parts, it may be more reasonable to split the method into two methods. To validate how the presence of empty lines correlates with the complexity of methods and their size, we analyzed 100K+ Java classes from 100+ public GitHub repositories. Then, we summarized the results obtained. (We already wrote a paper about this, now trying to publish it!)

Q12: Giving a good descriptive name to a function is paramount to its quality, including readability, complexity, and maintainability. To confirm this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories and compared the readability of method names with their complexity. We used Large Language Model (LLM) to verify the readability of names and Cyclomatic Complexity as a metric for complexity. Then, we summarized the results obtained.

Q13: Encapsulation, in object-oriented programming, also known as information hiding, implies that object attributes are invisible to outside code and may only be accessed by the methods of the object. However, not every program adheres to the principle of encapsulation, thus making attributes public and available for reading and/or modification. This may potentially impact the quality of the class. To validate how the presence of public attributes correlates with the complexity of methods in the class and their size, we analyzed 100K+ Java classes from 100+ public GitHub repositories. Then, we summarized the results obtained.

Q14: Getters and setters are a popular design pattern in object-oriented code, which some practitioners consider an anti-pattern. They believe that classes with getters and setters belong to the so-called anemic domain model, which makes objects smaller and controllers and utility classes larger. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories to find a correlation between the presence of getters and setters and the structure of classes: number of methods, size of methods, number of constructors, and so on. Then, we summarized the results obtained.

Q15: The amount of data and objects an object encapsulates may be an indicator of the quality of its design. When an object encapsulates nothing, it may not be a proper object but rather a utility class or a controller. When an object encapsulates a dozen objects, it may be a non-cohesive object. We believe that there is a correlation between the number of attributes of a class and its cohesion and complexity. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories and then summarized the results obtained.

Q16: Some software practitioners advocate the idea that methods in a class must either be manipulators or builders. A manipulator may only make changes to the object and must return void. A builder is not allowed to make changes but must return a new object. We expect that not many programmers follow this principle. To understand how many methods are designed with this principle in mind, we analyzed 100K+ Java classes from 100+ public GitHub repositories with the help of LLM, paying attention to the way methods are named: either as nouns or verbs. Then, we summarized the results obtained.

Q17: There is a belief that every class in object-oriented programming must implement at least one interface (in the case of Java) and have no public methods that are not defined in the interfaces that the class implements. Such a design would potentially lead to more cohesive and less complex classes. To validate this belief and find out whether this principle indeed leads to smaller and more cohesive classes, we analyzed 100K+ Java classes from 100+ public GitHub repositories, trying to find a correlation between the presence of interfaces that they implement and their complexity. Then, we summarized the results obtained.

Q18: Public static constants are known to be a recommended replacement for inline literals, such as numbers or strings. They are intended to have descriptive names that explain the semantics of the data, helping programmers understand what the data is for. However, we expect that a rather large number of constants don't have descriptive names but are named randomly, which misleads programmers even more than nameless data. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories, found all public constants used in classes, asked a Large Language Model to verify the meaning of their names, and then summarized the results obtained.

Q19: There are a number of possible elements that constitute a class in Java, such as methods, attributes, constructors, inherited interfaces, parent class, static methods, and so on. We were interested in knowing what the most "typical" structure of a Java class is and what the distribution of other classes is around the median. To answer this question, we analyzed 100K+ Java classes from 100+ public GitHub repositories, and then summarized the results obtained.

Q20: Getters and setters are a well-known design pattern, especially in Java. A getter is supposed to be a method that returns the value of an encapsulated attribute, while a setter is supposed to set the value of it. However, some programmers misunderstand how getters and setters are intended to be designed; they include a lot more evaluations inside them. In order to find out how often getters and setters are not designed as expected, we analyzed 100K+ Java classes from 100+ public GitHub repositories, and then summarized the results obtained.

Q21: In Java, object attributes may refer to primitives, such as int or float, or they may refer to objects. There's a hypothesis that classes which encapsulate only primitives or only objects have lower complexity and higher cohesion than those that have attributes of both kinds. To validate this intuition, we analyzed 100K+ Java classes from 100+ public GitHub repositories and then summarized the results obtained.

Q22: In Java, a class may inherit from another class. Some experts believe that inheritance can have a negative impact on the quality of code and suggest using composition instead. To validate this belief, we analyzed over 100K Java classes from more than 100 open-source repositories. We compared the presence of a parent class with the complexity of the methods in the child class and then summarized the results obtained.

Q23: In Java, some classes may be generic, having one or more type parameters. Our hypothesis is that generic classes are generally more cohesive and less complex than other classes. To validate this belief, we analyzed over 100K Java classes from more than 100 open-source repositories. We compared the presence of type parameters in class declarations with the complexity of their methods and then summarized the results obtained.

Q24: In Java, annotations are a popular instrument in some frameworks. However, some experts believe that the impact of annotations on the quality of code may be negative. To validate this belief, we analyzed over 100K Java classes from more than 100 open-source repositories. We compared the presence of annotations in class declarations with the complexity of their methods and then summarized the results obtained.

Q25: There is a belief that Test-Driven Development is only used later in a lifecycle of a repository, while initially no tests are written. It would be interesting to study a larger number of Github repositories and either confirm or disconfirm this theory.

Q26: The ratio of failed CI builds vs. passed builds may be an indicator of quality of a Github repository. To validate this assumption, we analyzed over 1000 open Java Github repositories and compared statuses of their CI builds with the complexity of their code.

Q27: Comments density is the ratio between the number of commenting lines and the total number of lines in a class or a method. There is a hypothesis that programmers tend to add comments when the complexity of the code is too high. Thus, comments density may be a good predictor of quality/complexity problems. In order to validate this assumption, we analyzed 100K+ Java classes and calculated Pearson correlation between comment density of them and their cyclomatic complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment