Computer Help and Support

sl8

(16,671 posts) Sun Jun 1, 2025, 06:55 AM Jun 1

Google Books Ngram Viewer

https://books.google.com/ngrams/

Wikipedia article excerpt:
https://en.m.wikipedia.org/wiki/Google_Books_Ngram_Viewer

Google Books Ngram Viewer

The Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2022[1][2][3][4] in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish.[1][2][5] There are also some specialized English corpora, such as American English, British English, and English Fiction.[6]

Example of an Ngram query

The program can search for a word or a phrase, including misspellings or gibberish.[5] The n-grams are matched with the text within the selected corpus, and if found in 40 or more books, are then displayed as a graph.[6] The Google Books Ngram Viewer supports searches for parts of speech and wildcards.[6] It is routinely used in research.

[...]

Usage

Commas delimit user-entered search terms, where each comma-separated term is searched in the database as an n-gram (for example, "nursery school" is a 2-gram or bigram).[6] The Ngram Viewer then returns a plotted line chart. Note that due to limitations on the size of the Ngram database, only matches found in at least 40 books are indexed.[6]

Limitations

The data sets of the Ngram Viewer have been criticized for their reliance upon inaccurate optical character recognition (OCR) and for including large numbers of incorrectly dated and categorized texts.[11] Because of these errors, and because they are uncontrolled for bias[12] (such as the increasing amount of scientific literature, which causes other terms to appear to decline in popularity), care must be taken in using the corpora to study language or test theories.[13] Furthermore, the data sets may not reflect general linguistic or cultural change and can only hint at such an effect because they do not involve any metadata like date published,[dubious – discuss] author, length, or genre, to avoid any potential copyright infringements.

[...]