trevorlinton/Generating Code Statistics.md

## Generating Code Statistics.md

      
    Raw
  

              Generating Code Statistics.md
            
          
    Generating Pressure Stats

These can be useful statistics (along with other static analysis tools) to see what types of pressure a class within Java or Scala is under. This will also search for xhtml/includes to measure on JSF the inclusion rate for XHTML files. This is implemented as a simple bash script that outputs a JSON file:
java_class_use.sh

#!/bin/sh
if [ "$1" == "" ]; then
  echo "java_class_use.sh directory"
  exit 1
fi

SEARCHDIR=$1

echo "\"java_results\":{"
for i in $(find $SEARCHDIR -not -path '*/\.*' -type f -name "*.java" -exec perl -ne 'while(/\s*(public|private)\s+class\s+(\w+)\s+((extends\s+\w+)|(implements\s+\w+( ,\w+)*))?\s*\{/g) { print "$2\n";}' {} \;); do
  RES=`find $SEARCHDIR -not -path '*/\.*' -type f -name "*.java" -exec grep -o $i {} \; | wc -l`
  echo "\t\"$i\":\"$RES\","
done
echo "}"
The output can then be collected and analyzed over time. These aren't very good measures of QUALITY but PRESSURE, or the POTENTIAL amount of dependencies and its usage amount.
We can also do the same for scala:
scala_class_use.sh

#!/bin/sh
if [ "$1" == "" ]; then
  echo "scala_class_use.sh directory"
  exit 1
fi

SEARCHDIR=$1

echo "\"scala_results\":{"
for i in $(find $SEARCHDIR -not -path '*/\.*' -type f -name "*.scala" -exec perl -ne 'while(/\s*class\s+(\w+)\s+((extends\s+\w+)|(implements\s+\w+( ,\w+)*))?\s*\{/g) { print "$1\n";}' {} \;); do
  RES=`find $SEARCHDIR -not -path '*/\.*' -type f -name "*.scala" -exec grep -o $i {} \; | wc -l`
  echo "\t\"$i\":\"$RES\","
done
echo "}"
We can also turn our heads to JSF and Faces and see how often XHTML files are included.
xhtml_include_use.sh

#!/bin/sh
if [ "$1" == "" ]; then
  echo "xhtml_include_use.sh directory"
  exit 1
fi

SEARCHDIR=$1

echo "\"xhtml_results\":{"
for i in $(find $SEARCHDIR -not -path '*/\.*' -type f -name "*.xhtml" -exec basename {} \;); do
  RES=`find $SEARCHDIR -not -path '*/\.*' -type f -name "*.xhtml" -exec grep -o $i {} \; | wc -l`
  echo "\t\"$i\":\"$RES\","
done
echo "}"
Generating Java Class Dependencies

We can also go beyond use and get the actual class dependencies that exist (note, were still using a naive method of regex, its not perfect, but the only other options I can think of is a more sophisticated AST tree).
java_class_deps.sh

#!/bin/sh
if [ "$1" == "" ]; then
  echo "java_class_deps.sh directory"
  exit 1
fi

SEARCHDIR=$1

echo "\"java_deps_results\":{"
for i in $(find $SEARCHDIR  -not -path '*/\.*' -type f -name "*.java" -exec perl -ne 'while(/\s*(public|private)\s+class\s+(\w+)\s+((extends\s+\w+)|(implements\s+\w+( ,\w+)*))?\s*\{/g) { print "$2\n";}' {} \;); do
  RES=`find $SEARCHDIR -not -path '*/\.*' -type f -name "*.java" -exec grep -l $i {} \; `
  echo "\t\"$i\":{"
  for n in $RES; do
    BN=`basename -a -s .java $n`
    echo "\t\t\"$BN\":\"$n\","
  done
  echo "\t},"
done
echo "}"
Making it fast

Unfortunately as you might notice this has a O(n^2) runtime (it must search each document twice for reference and definition).  To help with time necessary instead of implementing a more sophisticated AST parser we'll just go with BRUTE FORCE! We can do this by creating a ramdisk and cloning into the ramdisk, then searching in that ramdisk.
This is OS X specific but similar commands exist for Linux (loopback ramdisk mount?) and Windows (imdisk).
In addition we only need the tip of the master branch for the git repo's to search, not the entire history and all branches, we can make this considerably faster by only cloning (and thus, searching) the master branch.
makeitfast.sh

#!/bin/sh
diskutil erasevolume HFS+ 'BlastRadius' `hdiutil attach -nomount ram://4194304`

git clone --depth 1 --branch master https://github.com/somename/somerepo /Volumes/BlastRadius/somerepo

... repeat for each repo ...

./java_class_deps.sh /Volumes/BlastRadius/ > java_class_deps.json
./java_class_use.sh /Volumes/BlastRadius/ > java_class_use.json
./scala_class_use.sh /Volumes/BlastRadius/ > scala_class_use.json
./xhtml_include_use.sh /Volumes/BlastRadius/ > xhtml_include_use.json

diskutil unmount /Volumes/BlastRadius
Note this script requires adding in (manually) the git repo's you wish to analyze.  Its also limited to 2GB, you can increase this by upping the ram://XXXXX (it must be in intervals of 2048 to avoid page faults!)