Skip to content

Instantly share code, notes, and snippets.

@dwd
Created April 14, 2014 18:28
Show Gist options
  • Save dwd/10671760 to your computer and use it in GitHub Desktop.
Save dwd/10671760 to your computer and use it in GitHub Desktop.
#ifndef TAINTCXX_H
#define TAINTCXX_H
namespace taint {
template<typename X> class tainted {
private:
X m_x;
public:
tainted(X const & x) : m_x(x) {}
tainted(X && x) : m_x(x) {}
tainted(tainted<X> && y) : m_x(y.m_x) {}
tainted(const tainted<X> & y) : m_x(y.m_x) {}
tainted() : m_x() {};
X const * operator -> () const {
return &m_x;
}
X * operator -> () {
return &m_x;
}
X const & detaint() const {
return m_x;
}
X & detaint() {
}
};
template<typename X> X const & detaint_cast<X>(tainted<X> const & y) {
return t.detaint();
}
template<typename S> class tainted_istream {
private:
S & m_s;
public:
tainted_istream(S & s) : m_s(s) {}
template<typename X> tainted_istream<S> & operator >> (tainted<X> & t) {
m_s >> t.detaint();
}
};
}
#endif
@KayEss
Copy link

KayEss commented Apr 15, 2014

With a monad you'd have a constructor, a bind (for lifting functions so they can work with tainted values) and something to unpack the value again -- very similar to what you've got.

What's the actual use case for this?

@dwd
Copy link
Author

dwd commented Apr 15, 2014

Managing external input safely, so that things like bounds checking etc is more clear.

Specifically, its the chain of thought from Heartbleed, where the reviewer missed an unsanitized value because there's no taint support in either language or library to make it more apparent. The result in this instance was catastrophic, but this is the kind of bug we see in plenty of software in plenty of ways, albeit usually with much less impact.

I wrote something longer on G+ in the C++ community, but it's really a matter of seeing whether I could get explicit taint tracking, so that both handling tainted data was safer - and compiler-enforced - and a reviewer could easily see where the data's taint was removed (and therefore find out how).

I think - assuming I understand this correctly - that a monadic style would mean that taint removal was explicit in both location and logic - a reviewer could see both where and how/why a value became untainted.

@KayEss
Copy link

KayEss commented Apr 16, 2014

In which case I wonder if approaching this from the other side isn't beneficial. Essentially the type system within the compiler is able to prove theorems -- if a theorem is wrong then you get a compile error. So what would a theorem about your input processing look like?

Is it enough for your network code to return a tainted<std::vector<char>> and to only allow access of those bytes through the at() operator or iterators, both of which enforce bounds checking?

If you're reading some sort of structure then you really need to be able to chain the reads. Not sure how that ought to look. If you're ok with the tainted vector having state to mark what has been read then you could do:

auto protocol = detaint<8, std::vector>(tainted_bytes); // Fixed length string field
auto length = detaint<2, int>(tainted_bytes); // 2 bytes int in network order 
auto token = detaint<std::string>(tainted_bytes, length);

Although the detaint could just as easily be a member of the tainted type. Under this model the tainted class is really just the network data buffer that your underlying library is pulling from the socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment