Let's define $a_{n,m}$ to be the number of words of length $n$ with a surplus of $m$ A's. E.g. with $k=2$, AAAB would contribute $1$ to $a_{4,1}$; with $k=3$ it would contribute $1$ to $a_{4,0}$.
Given a word of length $n-1$ we can always append an A to get a valid word; we can append a B to get a valid word iff the surplus is at least $k$. So if we set up the generating function $$f(x,z) = \sum_{n \ge 0} \sum_{m=0}^n a_{n,m} x^n z^m$$ we find that $$f(x, z) = 1 + xz f(x, z) + xz^{-p} \sum_{n \ge 0} \sum_{m=p}^n a_{n,m} x^n z^m \tag{1}$$ which doesn't look very promising.
However, I believe Deutsch proved that this kind of regression always gives a Riordan array, so let's assume that we can write $f(x, z)$ in the form $$f(x, z) = \sum_{m \ge 0} z^m d(x) (xh(x))^m \tag{2}$$
Useful observation: the length minus the surplus is always a multiple of $k+1$ (easily shown by induction), so $d(x)$ only has non-zero coefficients at powers of $x^{k+1}$, or $d(x) = d_k(x^{k+1})$.
Useful observation: if we ext