Skip to content

Instantly share code, notes, and snippets.

@obfusk
Created January 11, 2023 09:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obfusk/8117be542a56268d80e1ee986ff46756 to your computer and use it in GitHub Desktop.
Save obfusk/8117be542a56268d80e1ee986ff46756 to your computer and use it in GitHub Desktop.
testing proposed javascript magic

JavaScript detection

$ cd /usr/share/nodejs
$ find -name '*.js' | wc -l
16761
$ find -name '*.js' -print0 | sort -z | xargs -0 file \
  | grep -Ev ': *(symbolic link|(empty|directory)$)' \
  | sed -r 's/.*: *//; s/(with very long lines).*/\1/' \
  | sort | uniq -c

before:

  12977 ASCII text
      4 ASCII text, with CRLF line terminators
    435 ASCII text, with no line terminators
   1618 ASCII text, with very long lines
     69 Algol 68 source, ASCII text
     12 Algol 68 source, ASCII text, with very long lines
      3 Algol 68 source, Unicode text, UTF-8 text
      2 Algol 68 source, Unicode text, UTF-8 text, with very long lines
      4 C source, ASCII text
      1 C source, ASCII text, with very long lines
    423 C++ source, ASCII text
     12 C++ source, ASCII text, with very long lines
     27 C++ source, Unicode text, UTF-8 text
      2 C++ source, Unicode text, UTF-8 text, with very long lines
      3 CSV text
     14 HTML document, ASCII text
      3 HTML document, Unicode text, UTF-8 text
      2 HTML document, Unicode text, UTF-8 text, with very long lines
    803 Java source, ASCII text
     39 Java source, ASCII text, with very long lines
     16 Java source, Unicode text, UTF-8 text
      5 Java source, Unicode text, UTF-8 text, with very long lines
      1 LaTeX document, ASCII text
      1 LaTeX document, ASCII text, with very long lines
     59 Node.js script text executable
    147 Unicode text, UTF-8 text
      1 Unicode text, UTF-8 text, with no line terminators
     52 Unicode text, UTF-8 text, with very long lines
      1 data
      2 exported SGML document, ASCII text
      2 exported SGML document, ASCII text, with very long lines

after (97% detected):

    215 ASCII text
     13 ASCII text, with no line terminators
    224 ASCII text, with very long lines
      3 CSV text
      1 HTML document, ASCII text
  14077 JavaScript source, ASCII text
      4 JavaScript source, ASCII text, with CRLF line terminators
    422 JavaScript source, ASCII text, with no line terminators
   1461 JavaScript source, ASCII text, with very long lines
    184 JavaScript source, Unicode text, UTF-8 text
     43 JavaScript source, Unicode text, UTF-8 text, with very long lines
     54 Node.js script executable, ASCII text
      5 Node.js script executable, ASCII text, with very long lines
     12 Unicode text, UTF-8 text
      1 Unicode text, UTF-8 text, with no line terminators
     20 Unicode text, UTF-8 text, with very long lines
      1 data
$ cd chromium-101.0.4951.64
$ find -name '*.js' | wc -l
23961
$ find -name '*.js' -print0 | sort -z | xargs -0 file \
  | grep -Ev ': *(symbolic link|(empty|directory)$)' \
  | sed -r 's/.*: *//; s/(with very long lines).*/\1/' \
  | sort | uniq -c

before:

  18616 ASCII text
     50 ASCII text, with CRLF line terminators
      1 ASCII text, with CRLF, LF line terminators
    525 ASCII text, with no line terminators
   1237 ASCII text, with very long lines
    120 Algol 68 source, ASCII text
      4 Algol 68 source, ASCII text, with very long lines
     10 Algol 68 source, Unicode text, UTF-8 text
      1 Apache Avro version 101
     71 C source, ASCII text
      1 C source, ASCII text, with CRLF line terminators
      7 C source, ASCII text, with very long lines
      3 C source, Unicode text, UTF-8 text
    615 C++ source, ASCII text
     27 C++ source, ASCII text, with very long lines
     28 C++ source, Unicode text, UTF-8 text
      1 C++ source, Unicode text, UTF-8 text, with very long lines
      6 CSV text
    150 HTML document, ASCII text
      1 HTML document, ASCII text, with CRLF line terminators
      5 HTML document, ASCII text, with very long lines
      9 HTML document, Unicode text, UTF-8 text
   1767 Java source, ASCII text
     47 Java source, ASCII text, with very long lines
     49 Java source, Unicode text, UTF-8 text
      8 Java source, Unicode text, UTF-8 text, with very long lines
      1 LaTeX document, ASCII text
     10 Nim source code, ASCII text
     60 Node.js script text executable
      3 Python script, Unicode text, UTF-8 text executable, with very long lines
      1 Ruby script, ASCII text
      1 Ruby script, Unicode text, UTF-8 text
      2 SVG XML document
      1 Unicode text, UTF-8 (with BOM) text
    369 Unicode text, UTF-8 text
      1 Unicode text, UTF-8 text, with no line terminators
     66 Unicode text, UTF-8 text, with very long lines
      1 assembler source, ASCII text
      3 data
     22 exported SGML document, ASCII text
      3 exported SGML document, ASCII text, with very long lines
      3 exported SGML document, Unicode text, UTF-8 text

after (93% detected):

   1352 ASCII text
      4 ASCII text, with no line terminators
     27 ASCII text, with very long lines
      1 Apache Avro version 101
      8 C source, ASCII text
     42 C++ source, ASCII text
      1 C++ source, ASCII text, with very long lines
      6 CSV text
     21 HTML document, ASCII text
      1 HTML document, Unicode text, UTF-8 text
     37 Java source, ASCII text
     12 Java source, ASCII text, with very long lines
      5 Java source, Unicode text, UTF-8 text, with very long lines
  19914 JavaScript source, ASCII text
     52 JavaScript source, ASCII text, with CRLF line terminators
      1 JavaScript source, ASCII text, with CRLF, LF line terminators
    521 JavaScript source, ASCII text, with no line terminators
   1290 JavaScript source, ASCII text, with very long lines
      1 JavaScript source, Unicode text, UTF-8 (with BOM) text
    440 JavaScript source, Unicode text, UTF-8 text
      1 JavaScript source, Unicode text, UTF-8 text, with no line terminators
     68 JavaScript source, Unicode text, UTF-8 text, with very long lines
     56 Node.js script executable, ASCII text
      2 Node.js script executable, ASCII text, with very long lines
      1 Node.js script executable, Unicode text, UTF-8 text
      1 Node.js script executable, Unicode text, UTF-8 text, with very long lines
      3 Python script, Unicode text, UTF-8 text executable, with very long lines
      2 SVG XML document
     31 Unicode text, UTF-8 text
      2 Unicode text, UTF-8 text, with very long lines
      3 data

C/C++ false positives

$ cd chromium-101.0.4951.64
$ find -name '*.c' | wc -l
7953
$ find -name '*.cpp' | wc -l
9624
$ find -name '*.c' -print0 | sort -z | xargs -0 file | grep JavaScript
$ find -name '*.cpp' -print0 | sort -z | xargs -0 file | grep JavaScript

C: 1 false positive ((function name) in comment).

C++: 4 false positives (contain JavaScript code).

Java false positives

$ cd dir/with/java/code
$ find -name '*.java' | wc -l
4890
$ find -name '*.java' -print0 | sort -z | xargs -0 file | grep JavaScript

1 false positive (if (function != null).

HTML false positives

$ cd chromium-101.0.4951.64
$ find -name '*.html' | wc -l
4497
$ find -name '*.html' -print0 | sort -z | xargs -0 file | grep JavaScript

7 false positives (partial documents, mostly JavaScript in a <script> tag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment