WebCorp Linguist's Search Engine (WebCorp LSE) is a tool for the study of language on the web. The corpora were built by crawling the web and extracting textual content from web pages. Searches can be performed to find words, lemmas or phrases, including pattern matching, wildcards and part-of-speech. Results are given as concordance lines in KWIC format. Post-search analyses are possible including time series, collocation tables, sorting and summaries of meta-data from the matched web pages.
WebCorp LSE is being updated substantially during 2022. Even if you had a user account in the old version, you will need to create a new account.
The WebCorp LSE corpora are now annotated using the Stanford Core NLP tools and now include lemma annotations and part-of-speech categories based on the Universal Dependencies framework. In addition, WebCorp LSE now pre-compiles a substantial amount of information for words, lemmas and frequent n-grams. This makes exploring collocation and frequency change-over-time information much faster and, hopefully, easier than before.
Other WebCorp tools
WebCorp LiveConcordance the latest information from the web.