Skip to content

Instantly share code, notes, and snippets.

@heetbeet
Created April 18, 2023 11:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save heetbeet/f766c3da2fd933bed45b1010d8f8eab6 to your computer and use it in GitHub Desktop.
Save heetbeet/f766c3da2fd933bed45b1010d8f8eab6 to your computer and use it in GitHub Desktop.
Sequence Probability Calculation

USER

You have two sequences, a long one and a short one. The short one is 3 characters long, the long one is 40 characters long. They are comprised of characters from the latin alphabet (each char can be one of 26 possible chars). What is the probability of the shorter sequence occurring in the longer sequence? Keep everything symbolic right up to the end.

ASSISTANT

Let's denote the short sequence as S and the long sequence as L. The short sequence S has a length of 3 characters, while the long sequence L has a length of 40 characters. We want to find the probability of S occurring in L.

There are 26 possible characters for each position in both sequences, and there are (40-3+1) = 38 possible positions in the long sequence where the short sequence could be placed.

Let's compute the probability of the short sequence occurring at a specific position in the long sequence. Since there are 26 possible characters for each position and the characters are assumed to be independent, the probability of the short sequence matching at a specific position in L is:

P(match at a specific position) = (1/26)^3

Now, we want to compute the probability of S occurring at least once in L. To do this, we can compute the complementary probability of S not occurring at any position in L, and then subtract that from 1:

P(S not occurring in L) = (1 - P(match at a specific position))^38 P(S occurring in L) = 1 - P(S not occurring in L)

Now we can plug in the probabilities we've computed:

P(S occurring in L) = 1 - ((1 - (1/26)^3)^38)

This is the probability of the shorter sequence occurring in the longer sequence.

USER

That was great. Please write it in latex. You can fill in this template. Remember to use latex mathematical equations properly:

\documentclass{article}
\usepackage{graphicx} % Required for inserting images

\title{<...>}

\begin{document}
\maketitle

<...>

\end{document}

ASSISTANT

Here is the LaTeX template with the probability computation included:

\documentclass{article}
\usepackage{amsmath} % Required for math environments and symbols
\usepackage{graphicx} % Required for inserting images

\title{Probability of a Short Sequence Occurring in a Long Sequence}

\begin{document}
\maketitle

Consider a short sequence $S$ with a length of 3 characters and a long sequence $L$ with a length of 40 characters. Both sequences are comprised of characters from the Latin alphabet, meaning each character can be one of 26 possible characters. We want to find the probability of the short sequence $S$ occurring in the long sequence $L$.

There are 38 possible positions in the long sequence where the short sequence could be placed, as $(40 - 3 + 1) = 38$. The probability of the short sequence matching at a specific position in $L$ is:

\begin{equation}
P(\text{match at a specific position}) = \left(\frac{1}{26}\right)^3
\end{equation}

To compute the probability of $S$ occurring at least once in $L$, we calculate the complementary probability of $S$ not occurring at any position in $L$, and then subtract that from 1:

\begin{equation}
P(S \text{ not occurring in } L) = \left(1 - P(\text{match at a specific position})\right)^{38}
\end{equation}

\begin{equation}
P(S \text{ occurring in } L) = 1 - P(S \text{ not occurring in } L)
\end{equation}

Now we can plug in the probabilities we've computed:

\begin{equation}
P(S \text{ occurring in } L) = 1 - \left(1 - \left(\frac{1}{26}\right)^3\right)^{38}
\end{equation}

This is the probability of the shorter sequence occurring in the longer sequence.

\end{document}

Simply copy and paste this code into a LaTeX editor and compile the document to see the formatted output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment