Skip to content

Instantly share code, notes, and snippets.

@ashgillman
Created October 26, 2015 12:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ashgillman/a96138017e2936bc2c2c to your computer and use it in GitHub Desktop.
Save ashgillman/a96138017e2936bc2c2c to your computer and use it in GitHub Desktop.
Steganography
\subsection{Natural Language Watermarking and Tamperproofing}
In \emph{Natural language watermarking and tamperproofing}, Atallah \emph{et al.} \cite{atallah2003natural} propose a method for steganographically embedding watermarks or fingerprints into plain text. Modern natural language steganography methods attempt to apply steganographic methods to text, without relying on modifying specific formatting parameters, such as \LaTeX or HTML.
In this work, authors continue from previous work \cite{atallah2001natural}, and use a similar base concept. The concept depends on fundamental redundancies in language structure and semantics. These two separate areas of redundancy are exploited within this text to embed the hidden message.
Firstly, a sentence can be restructured, and maintain the exact same meaning \cite{bennett2004linguistic}. For example, ``Ashley submitted a perfect assignment'' and ``A perfect assignment was submitted by Ashley.'' In these sentenced, the order of the subject and object have been reversed, however the meaning has not been changed. Thus if the meaning of a sentence can be extracted, and can be reconstructed in a number of different, but mathematically defined ways, the specific reconstruction method can be used to encode extra information. This is known at the syntactic marking scheme.
Secondly, the semantics of parts of a sentence, or sentences within a paragraph can be changed, but result in the same overall meaning \cite{atallah2003natural}. For example, information can be \emph{grafted} from one sentence to another. Grafting ``The perfect assignment received an HD grade'' into ``Ashley submitted a perfect assignment'' might form ``Ashley submitted a perfect assignment, which received an HD grade.'' Another option is to \emph{prune} redundant data, that might be shared and repeated between nearby sentences; or to \emph{substitute} data with equivalent data.
Unlike other linguistic steganography methods, these methods allow a hidden message, such as a fingerprint or watermark, to be embedded within a text, without modifying the overall meaning of the text. In addition, the use of the two methods in conjunction allows the document to become self-error-checking, thereby any tampering becomes evident.
The algorithm would first step through and modify the semantic meaning using grafting, pruning and substitution, encoding the hidden message within. The message was encoded using the sentence \ac{TMR} tree \cite{nirenburg2004ontological}. \ac{TMR} provides a method by which a sentence can be encoded and represented by its pure meaning, as opposed to its syntactic structure. A typical sentence can encode up to a byte in the \ac{TMR}, however in this text only four bits are encoded per sentence, allowing for the modified sentences to be \emph{marked}.
After this first pass, the syntactic pass was also made in order to allow for tampering detection. The syntax encoding method used was the syntax tree. By modifying the syntax of a sentence without modifying the \ac{TMR}, both the sentence meaning and the watermark encoding are maintained. The syntax tree was modified in such a way to encode the a signature of the document itself. In this way, tampering becomes immediately evident based on a syntactic change in the document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment