The dark blue data points represent the top 20 occurring English words (with the first few labeled). This gives us a hyperbola, that we met before.) I have included the "Theoretical Zipf Distribution, based on the n-th ranked word occurring approximately `1/n` times the frequency of the highest ranked word. This Corpus is the count of how often one million words were used in a variety of books, newspapers and other publications. (The first 20 words in the Brown Corpus, published in 1967. The next ranked word, "of", occurred around `3.6%` of the time (or about `1/2` as often as the top-ranked word.) The third most popular word was "and", with a frequency of `2.8%`, or roughly `1/3` of the frequency of the top ranked word. The most common word, "the" occurred around `70,000` times (or `7%` of the million words counted). The table is based on the Brown Corpus, a careful study of a million words from a wide variety of sources including newspapers, books, magazines, fiction, government documents, comedy and academic publications. Zipf originally developed his law in response to the observation that the frequency of words was inversely proportional to the rank of each word.įor example, the most common 20 words in English are listed in the following table.