In Haskell, functions are pure and effects are explicit.
append :: [a] -> [a]
filter :: (a -> Bool) -> [a] -> [a]
If you want to cause an effect, you need to write a function that returns an action:
filterIO :: (a -> IO Bool) -> [a] -> IO [a]
putStr :: String -> IO ()
We will now look at an API that consists of two functions: one for
getting the system time, and one for getting the contents of
/etc/shadow
. Clearly, granting access to the first function is far
less problematic. We start by providing the unprotected functionality
in a module called TCB
(for "trusted computing base").
module Api.TCB
unsafeGetTime :: IO UTCTime
unsafeGetShadowFile :: IO [String]
What we want instead is two functions that internally may call the above, but on the surface keep track of the legitimacy of all operations:
-- (still in Api.TCB)
data L = Public | Member | Admin
deriving (Eq, Ord, Show)
instance Label L where ...
getTime :: LIO L UTCTime
getShadowFile :: LIO L [String]
L
is the information on the clearance level required for the action
currently performed. We will talk about a more sophisticated instance
type of 'Label' in the next section. For now just think of them as
totally ordered.
If we run an action, we need to provide a clearance level:
runTCB :: Label l => l -> LIO l a -> IO a
runTCB = ...
main = do
role <- getUserRole
runTCB role getShadowFile >>= print
In a web application, getUserRole
may extract user name and password
from the HTTP request, verify the password, and lookup user roles in
the database if the password is correct. (In our current
implementation, it needs to crash if credentials are incorrect.)
LIO L
is a monad that makes sure that a given clearance level is
never violated. How is this achieved? Let's look at the
implementation.
getTime
is easy, because it does not require any particular
clearance level. We simply need to introduce a function that turns a
labeled action into an unlabelled one:
ioTCB :: LIO l a -> IO a
Note that calling ioTCB requires you to manually verify that the
clearance is sufficient for performing the action. This is why
modules that imports it must be trusted, and why we called the current
module TCB
.
Now we can provide a trivial implementation of getTime
:
getTime :: Label l => LIO l UTCTime
getTime = do
ioTCB unsafeGetTime
(FIXME: drop this example for something more entertaining?)
access to /etc/shadow
, however, needs to be marked as privileged
access. lio provides the function taint
for this:
taint :: Label l => l -> LIO l ()
getShadowFile :: LIO L UTCTime
getShadowFile = taint Admin >> unsafeGetShadowFile
Note that the label must be fixed to 'L' here, because we actually update the current label in the labeled IO monad.
This is the gist of module Api.TCB
. We can now define a module:
module Api (getTime, getShadowFile) where
import Api.TCB (getTime, getShadowFile)
And we can mark this module as Safe
! [link to SafeHaskell wiki
page].
From here on, we can outsource development of an api extension in
module Api.Ext
to a company that employs positively malicious
developers. The only thing you need to do is check that it marked as
Safe
in the header. Then Api.Ext
will:
-
be able to use getTime and getShadowFile to implement more complex functions on top.
-
provably not construct any action that causes an effects that is not legitimized by the clearance level with which the action is called.
A DCLabel
has two components: the secrecy and the integrity.
Each individual component is given by a formula in conjunctive normal form over principals. Note that there is no negation of principals available (discussed further below).
DCLabel
s are attached to data. They have meaning if looked at in
connection with data.
Let's look at secrecy first:
False |
means ultimate secrecy, nobody is allowed to see the value (useless) |
True |
means completely public, everybody is allowed to see the value |
P |
means that exactly P is allowed to see the value |
We can build
P \/ Q |
both P and Q individually are allowed to see the value |
P /\ Q |
only P and Q together are allowed to see the value |
We can not build
not P |
everybody but P is allowed to see the value |
Let's also look at integrity:
False |
means ultimate integrity, nobody can have touched the value (see below) |
True |
means completely untrusted, everybody could have touched the value |
P |
means that exactly P must have produced the value |
We can build
P \/ Q |
both P and Q might have individually touched the value |
P /\ Q |
only P and Q together could have produced the value |
We can not build
not P |
everybody but P could have touched the value |
An observation is that naturally, secrecy can only increase, and integrity can only decrease: the secrecy of compound data is always the maximum of the secrecy of its components; the integrity of compound data is always the minimum of the integrity of its components.
So if we look at the secrecy lattice, we have
True <= P <= False
And if we look at the integrity lattice, we have
False <= P <= True
The minimal element of the combined lattice is thus
True %% False
which corresponds to "completely public" and "ultimately trusted". This is also
the label that is naturally attached to an empty piece of data (e.g. Haskell's
()
type): for a value of type ()
, there is no choice. Everbody can know it,
so it is naturally public. Also, nobody can or could have messed with it, so it
can be ultimately trusted.
The maximal element of the combined lattice is
False %% True
which corresponds to "ultimately secret" and "completely untrusted".
Now let us look at the lifetime of a program, and at reading and writing labelled data. Let us at first disregard our actual permissions. Let us look at the program as a policy-free and trusted actor that just wants to maintain labelling information properly, but is allowed to do everything in principle. (This corresponds in LIO to assuming we have maximum clearance.)
Under this assumption, we can always read data of any label. However, reading this data effectively combines it with everything we already know. That is why it makes sense to maintain a current label that starts out low (e.g. at the minimal value described below) and increases over time depending on all the things we have seen.
Example for secrecy: We start out with the current label True
, indicating
all information we currently have is public. We then read a secret belonging to P
.
Then the current label's secrecy is raised to P
. We then read another secret
belonging to Q
. Then the current label's secrecy is raised to P /\ Q
. We now
have secret data available for P
and Q
together.
(NOTE: This is not the same as the data being available for both P
and Q
,
which would mean either P
or Q
can read it. P
and Q
have to cooperate
to ccess data labelled with P /\ Q
.)
Example for integrity: We start out with the current label False
, which means
all information we currently have is ultimately trusted. We then read some
information signed by P
(i.e., with integrity P
). Then our current label's
integrity component is raised to P
. If we combine this with information signed
by Q
(i.e., with integrity Q
), then our current label's integrity component is
raised to P \/ Q
. We now have data available that could have been interfered
with by either P
or Q
.
We can write data to a variable that already has a fixed label attached to it, we can not always do it. It is easy to see that we can always write out data at our current label, or at any label higher than that. But if the variable has a label that is lower than our current label, we could potentially leak secret information or have untrusted information taint supposedly trusted data.
Example for secrecy: We start out with the current label True, indicating
all information we currently have is public. At this point, we can write
to a variable that is marked with secrecy True
, secrecy P
, secrecy Q
, or
even secrecy P /\ Q
. Once we read a secret belonging to P
, we can no longer
write to a variable marked with secrecy True
or secrecy Q
, but writing to
secrecy P
or secrecy P /\ Q
is still fine. If we then also read a secret
belonging to Q
, our current secrecy label becomes P /\ Q
, and the only
variable we can still write to is that of secrecy P /\ Q
(or yet higher).
Corner case (not exception): we can always write to a variable that is
ultimately secret.
Example for integrity: We start out with the current label True
, indicating
all information we currently have is ultimately trusted. At this point, we
can write to a variable that is marked with integrity False
, integrity P
,
integrity Q
, integrity P \/ Q
or integrity True
. Once we read data that
has integrity P
, our current label raises to integrity P
, meaning that all
data we have available could have been interfered with by P
. We now can
only write to variables with integrity P
, integrity P \/ Q
or integrity True
.
If we also read data with integrity Q
, our current integrity label is raised
to P \/ Q
, and we can only write to variables with integrity P \/ Q
or
integrity True
. In fact, we can always write to a variable marked with
integrity True
, indicating that it is completely untrusted.
Doesn't this feel asymmetric? Reading always succeeds, but writing can fail? Yes, but this is triggered by the role of the current label. If we read data, we can simply raise the current label as needed. If we write data, we'd need to raise the label of the variable we write to. But LIO's assumption is that variables have fixed labels attached to them. So if we can't raise it, the only other option is to fail.
We can block reading of data as well. This is the role of the current clearance of a computation. The current clearance provides an upper bound to the current label. So at all times the current clearance has to be at least as high as the current label. However, the current clearance cannot be raised, it can only be lowered (voluntarily).
Example for secrecy: If we set the secrecy component of the clearance of a
computation to P
. We still start out with current label True
. We can read
a variable that is secret to P
, which will raise the current secrecy label
to P
. But we can never even read a variable that is secret to Q
, because
this is beyond our clearance. In general, we can read nothing that is too
secret for our current clearance.
Example for integrity: If we set the integrity component of the clearance of
a computation to P
, we still start out with current label False
. We can read
a variable that is marked with integrity P
, which will raise the current
integrity label to P
. But we can never read a variable that is marked with
integrity Q
, because that is beyond our clearance. In general, we can read
nothing that would be less trustworthy than indicated by our current clearance.
Clearance also affects writing of data. TODO: explain why ...
TODO: What is a useful / sane value for clearance? I think that
typically the secrecy clearance will be set globally, i.e., to P
for
user P
. This ensures that a computation running on behalf of P cannot
work in any way with data that is too secret for P
. However, the
integrity clearance is a different issue. Any user might very well
work with untrusted data. We might temporarily want to ensure that we
work with trusted data. In this case, we might want to artificially
lower the integrity clearance temporarily.
You can read everything that is below your current label without changing your label. You can read everything that is below your current clearance by raising the label to the label of the data you have seen.
You can write to everything that has a label between your current label and your current clearance. Writing never changes the current label.
Special case where the current label is equal to the current clearance: you can read everything below your current clearance / label. You can write only at you current clearance / label.
If you use LIO references that are directly marked with a label, you get everything for free:
newLIORef :: Label l => l -> a -> LIO (LIORef l a)
This is essentially a write (of the initial data), but as you are creating new data, you can choose the label yourself. So the label has to be between your current label and your current clearance. The current label does not change.
readLIORef :: Label l => LIORef l a -> LIO l a
Reads, so the label has to be below the current clearance, and the current label is raised if necessary.
writeLIORef :: Label l => LIORef l a -> a -> LIO l ()
Writes, so it has exactly the same constraints attached to it as newLIORef
.
The current label does not change.
modifyLIORef :: Label l => LIORef l a -> (a -> a) -> LIO l ()
Is seen as only a write, because the original value of the reference is not
exposed. So it has the same restrictions as writeLIORef
.
It is natural that everyone has a certain amount of control over the secrecy and integrity of data that is against the natural order of everything automatically becoming only more secret and less trusted.
For example, if I receive data marked secret to me, I can very well
decide that I don't in fact consider it secret, and mark it
public. However, if I receive data that is marked secret to me (P
) and
someone else (Q
), i.e., has secrecy P /\ Q
, then I can at most remove
my secrecy component and declassify it to secrecy Q
. I cannot make it
completely public, however.
Does that mean that I can't myself read it any more after declassification?
Dually, if I receive untrusted data, I can check it and declare that it is,
in fact, trusted by me. I can sign it. It then has integrity P
. I can also
sign data that already has higher integrity than True
. I.e., integrity Q
.
If I then sign it, it gets integrity P /\ Q
. However, I cannot usually
sign data in any other way. So I cannot take untrusted data and suddenly
claim it has integrity Q
.
This is covered in LIO by the concept of privileges.
TODO: As I understood, privs are values that I can give untrusted code to do things that they are not supposed to do according to their lio state. I though declassification and endorsement would in principle work without privs, but I need to think more about this.