Skip to content

Instantly share code, notes, and snippets.

@snej
Created October 2, 2023 16:19
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save snej/2672fe996d39752e23c471f6ed789958 to your computer and use it in GitHub Desktop.
Save snej/2672fe996d39752e23c471f6ed789958 to your computer and use it in GitHub Desktop.
Script to find missing std #includes in C++ headers
#! /usr/bin/env ruby
#
# missing_includes.rb
# By Jens Alfke <jens@couchbase.com>
# Version 2.0 -- 2 Oct 2023
# Copyright 2021-Present Couchbase, Inc.
#
# This script scans C++ header files looking for usage of common standard library classes, like
# `std::vector`, without including their corresponding headers, like `<vector>`. It similarly looks
# for standard C functions like `strlen` that are used without including their header (`<cstring>`.)
#
# Such files may successfully build with one compiler, or standard library implementation, but
# fail with another, due to differences in which other headers the standard library headers include.
#
# **This script is, unapologetically, a hack.** It's not software engineering, it's a quick way to
# alleviate a problem I keep having when I submit my (Xcode-with-Clang built) local branch to
# upstream CI and get unknown-identifier errors from GCC and/or MSVC.
#
# Examples of output:
# - Default mode:
# *** include/foo.hh
# #include <functional> // for std::function, line 154
# - Compiler-warning mode (`--warn`):
# include/foo.hh:154: warning: Use of 'std::function' without including <functional>
#
# Disclaimers & Limitations:
#
# * This script does not use a real parser, just a bunch of regexes. [Obligatory jwz link]
# * It does not know about every symbol in every library header, just the ones I've added to the
# tables below. You are most welcome to add more.
# * It assumes the `std::` namespace is used explicitly, i.e. it ignores `vector` by itself.
# * Some functions, like `std::swap`, are defined in multiple headers with different parameter
# types. A simple hack like this can't possibly understand that.
# * It doesn't know about the original C headers like `<string.h>`, just their C++ adapters.
# * **It does not follow `#includes`.** It doesn't look at local headers #include'd by a header,
# header, so it will complain about `std::vector` even if the current header includes another
# header that includes `<vector>`. This is partly laziness, but mostly intentional. In such a
# situation you might alter the include'd header to not use vectors any more and remove the
# `#include <vector>`, causing a bunch of other header files to break. Or you might copy
# the downstream header to another project and then it won't compile until you figure out what
# includes to add.
# * **It only looks at header files, not source files.** Due to the above limitations, it's a lot
# less useful in source files. Source files commonly don't repeat library includes from their
# matching header. Source files often do `using namespace std`; at least, mine do.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
require "optparse"
require "ostruct"
require "pathname"
require "set"
# ANSI terminal color escapes
PLAIN = "\033[0m"
BOLD = "\033[1m"
DIM = "\033[2m"
ITALIC = "\033[3m"
UNDERLINE = "\033[4m"
CYAN = "\033[96m"
YELLOW = "\033[93m"
# The set of filename extensions I look for; the --ext flag adds to this.
HeaderFileExtensions = Set.new([".hh", ".hpp"])
# `look_for()` adds this prefix to identifiers.
StdPrefix = "std::"
# Maps from system header name to an array of other headers it canonically includes.
# (I got this from looking at the header documentation at cppreference.com.)
Includes = {
"algorithm" => ["initializer_list"],
"array" => ["compare", "initializer_list"],
"chrono" => ["compare"],
"iostream" => ["ios", "streambuf", "istream", "ostream"],
"map" => ["compare", "initializer_list"],
"memory" => ["compare"],
"optional" => ["compare"],
"set" => ["compare", "initializer_list"],
"stdexcept" => ["exception"],
"string" => ["compare", "initializer_list"],
"string_view" => ["compare"],
"tuple" => ["compare"],
"unordered_map" => ["compare", "initializer_list"],
"utility" => ["compare", "initializer_list"],
"variant" => ["compare"],
"vector" => ["compare", "initializer_list"],
}
# Adds `name`, and any other headers known to be included by it, to the set `headers`.
def addIncludes(headers, name)
unless headers.include?(name) then
headers.add(name)
included = Includes[name]
if included then
included.each {|h| addIncludes(headers, h)}
end
end
end
# Maps a regex for a class/fn name, to the name of the header it's defined in
Classes = Hash.new()
# Adds a bunch of identifiers and the header they're defined in
def look_for(header, identifiers)
identifiers.each do |cls|
Classes[Regexp.new("\\b" + StdPrefix + cls + "\\b")] = header
end
end
def look_for_c(header, identifiers)
identifiers.each do |cls|
Classes[Regexp.new("\\b" + cls + "\\b")] = header
end
end
# First parameter is header name, second is a list of symbols (without the `std::`) that require that header.
look_for("algorithm", ["binary_search", "clamp", "lower_bound", "max", "min", "minmax", "sort", "upper_bound"])
look_for("any", ["any", "make_any", "any_cast"])
look_for("array", ["array"])
look_for("atomic", ["atomic", "atomic_\\w+", "memory_order"])
look_for("chrono", ["chrono"])
look_for("compare", ["strong_order", "weak_order", "partial_ordering", "weak_ordering", "three_way_comparable", "three_way_comparable_with"])
look_for("deque", ["deque"])
look_for("exception", ["exception", "current_exception", "exception_ptr", "make_exception_ptr", "rethrow_exception", "terminate"])
look_for("fstream", ["filebuf", "ifstream", "ofstream", "fstream"])
look_for("functional", ["function", "bind", "ref", "invoke", "invoke_r", "mem_fn", "reference_wrapper", "unwrap_reference"])
look_for("initializer_list",["initializer_list"])
look_for("iosfwd", ["char_traits", "istream", "ostream", "fstream", "stringstream", "fpos"])
look_for("iostream", ["cerr", "cin", "cout", "clog"])
look_for("map", ["map", "multimap"])
look_for("memory", ["make_unique", "make_shared", "shared_ptr", "unique_ptr", "weak_ptr", "allocator", "allocator_traits", "pointer_traits"])
look_for("mutex", ["mutex", "timed_mutex", "recursive_mutex", "lock_guard", "unique_lock", "scoped_lock", "once_flag", "call_once"])
look_for("optional", ["make_optional", "optional", "nullopt"])
look_for("regex", ["regex", "sub_match", "match_results"])
look_for("set", ["set"])
look_for("sstream", ["string_stream", "stringbuf"])
look_for("string", ["string", "basic_string", "char_traits", "stoi", "stol", "stoll", "stoul", "stoull", "stof", "stod", "to_string", "npos"])
look_for("stdexcept", ["logic_error", "runtime_error", "invalid_argument", "domain_error", "length_error", "range_error", "overflow_error", "underflow_error"])
look_for("string_view", ["string_view"])
look_for("tuple", ["tie", "tuple"])
look_for("typeinfo", ["type_info", "bad_typeid", "bad_cast"])
look_for("unordered_map", ["unordered_map", "unordered_multimap"])
look_for("unordered_set", ["unordered_set", "unordered_multiset"])
look_for("utility", ["forward", "move", "pair", "get", "swap"])
look_for("variant", ["variant", "visit", "get_if", "monostate"])
look_for("vector", ["vector"])
# TODO: This is obviously incomplete. I've just been adding the most common stuff I find.
look_for_c("cassert", ["assert"])
look_for_c("cmath", ["abs", "ceil", "floor"])
look_for_c("cstring", ["memcmp", "memcpy", "memmove", "strlen", "strcpy", "strchr", "strrchr"])
look_for_c("cstdio", ["printf", "sprintf", "fprintf"])
##### TOOL CODE
# Command-line options
Options = OpenStruct.new
Options.verbose = false
Options.prefix = nil
Options.commonHeaders = Set.new()
Options.humanReadable = true
Options.diagnostic = "warning"
# Process result
$finalResult = 0
# Processes a file.
def scan_file(pathname)
headers = Options.commonHeaders.clone()
first = true
lineno = 0
file = File.new(pathname.to_s)
file.set_encoding("UTF-8")
file.each_line do |line|
lineno += 1
# TODO: Remove C-style comments, even multiline
line = line.split("//")[0]
if line =~ /\s*#include\s+<(\w+(\.h)?)>/ then
# Found an #include<...>:
addIncludes(headers, $1)
else
Classes.each do |classRegexp, headerName|
if not headers.include?(headerName) and line =~ classRegexp then
# Found a symbol without a prior #include of its header:
name = classRegexp.source[2..-3] # strip the "\b"
if Options.humanReadable then
if first then
first = false
puts "#{BOLD}*** #{pathname.parent}/#{YELLOW}#{pathname.basename}#{PLAIN}"
end
$stdout.write "\t\#include #{BOLD}#{CYAN}<#{headerName}>#{PLAIN}"
if Options.verbose then
$stdout.write "\t#{ITALIC}#{DIM}// for #{name}, line #{lineno}#{PLAIN}"
end
puts ""
$finalResult = 1
else
# Machine-readable (compiler output) form:
$stderr.write "#{pathname.parent}/#{pathname.basename}:#{lineno}: #{Options.diagnostic}: Use of '#{name}' without including <#{headerName}> [missing_includes.rb]\n"
$finalResult = 1 if Options.diagnostic == "error"
end
headers.add(headerName) # So I don't complain about the same header again
# TODO: Would be nice to alphabetize by header name
end
end
end
end
return headers
end
# Processes a directory tree
def scan_tree(dir)
dir.find do |file|
if HeaderFileExtensions.include?(file.extname) then
begin
scan_file(file)
rescue => detail
$stderr.write "Exception scanning #{file}: #{detail}\n\t"
$stderr.write detail.backtrace.join("\n\t"), "\n\n"
$finalResult = -1
end
end
end
end
OptionParser.new do |opts|
opts.banner = "#{BOLD}Usage: missing_includes.rb DIR...#{PLAIN}"
opts.on("--prefix HEADER", "--base HEADER", "Assume every header includes this file") do |p|
Options.commonHeaders.merge(scan_file(Pathname.new(p)))
end
opts.on("--ignore HEADER", "Ignore missing <HEADER>. May give multiple headers separated by commas.") do |h|
Options.commonHeaders.merge(h.split(","))
end
opts.on("--ext EXT", "Scan filenames ending with EXT too.") do |ext|
ext = "." + ext unless ext.start_with?(".")
HeaderFileExtensions.add(ext)
end
opts.on("--warn", "--warning", "-w", "Write compiler-style warnings to stderr") do
Options.humanReadable = false
Options.diagnostic = "warning"
end
opts.on("--error", "-e", "Write compiler-style errors to stderr") do
Options.humanReadable = false
Options.diagnostic = "error"
end
opts.on_tail("-v", "--[no-]verbose", "Verbose: show why each #include is needed") do |v|
Options.verbose = v
end
opts.on_tail("-h", "--help", "Show this message") do
puts opts
puts ""
puts "Finds C++ and C standard library headers you should probably \#include."
puts "Looks at all '.hh' and '.hpp' files in each given directory tree."
puts "When it finds a standard library identifier it knows about, like `std::vector` or"
puts "`strlen`, it checks if the corresponding header was included; if not, it prints a warning."
puts
puts "It works from a hardcoded list of common identifiers; this list is not comprehensive."
puts "Nor does it scan other local headers transitively included."
puts "Hopefully you'll find it useful anyway! I do."
exit
end
end.parse!
if ARGV.empty? then
puts "Please give at least one directory to scan. (Use --help for help.)"
exit 1
end
ARGV.each do |arg|
scan_tree(Pathname.new(arg))
end
exit $finalResult
@Beyarz
Copy link

Beyarz commented Oct 2, 2023

Does it work for C aswell?

@snej
Copy link
Author

snej commented Oct 4, 2023

Yes, with flags --ext .h. But to make it useful you'd need to add a lot more look_for_c(...) rules, and append .h to the header names in the first parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment