Parsing a document means translating it to a structure the code can use. The result of parsing is usually a tree of nodes that represent the structure of the document. This is called a parse tree or a syntax tree.
Parsing is based on the syntax rules the document obeys: the language or format it was written in. Every format you can parse must have deterministic grammar consisting of vocabulary and syntax rules.
Parsing can be separated into two sub processes: lexical analysis and syntax analysis.
Lexical analysis is the process of breaking the input into tokens
Tokens are the language vacabulary: the collection of valid building blocks.
The lexer is also called tokenizer, which is responsible for breaking the input into valid tokens. The lexer knows how to strip irrelevant charactoers like white spaces and line breaks.