Take a sample of English text, and calculate how often different letters occur in it.
- Break this into words by adding in spaces as if they were letters with a certain probability, and generate words by forcing the distribution of “word lengths” to agree with what it is in English
- Generate sentences by randomly choosing words at random, with the same probability that it appears in the corpus
- There are about 40,000,000 commonly used words in English, so we can estimate how common each word is.