WebCorp Live User Guide

How does it work?

The WebCorp interface is similar to the interfaces provided by standard search engines. You enter a word or phrase, choose options from the menus provided and then press the 'Submit' button. WebCorp works 'on top of' the search engine of your choice, taking the list of URLs returned by that search engine and extracting concordance lines from each of those pages - examples of your chosen word or phrase in context. All of the concordance lines are presented on a single results page, with links to the sites from which they came.

How is WebCorp different from search engines?

Search engines, such as Google and Bing, are designed to retrieve information from the World Wide Web. They use complex techniques to index the Web and return the documents from their indices which are most relevant for the user's request. WebCorp is designed to retrieve linguistic data from the Web: concordance lines showing the context in which the user's search term occurs. In response to a user query, standard search engines return a list of URLs (page addresses), along with a description of or some text from each page to help the user decide which pages are most useful. To view the pages, the user must click on each of the links individually.

WebCorp actually visits each one of these pages, extracting concordance lines from them. Although some search engines, such as Google, do give Key Word in Context style output for some of the URLs in the results list, this is not true for all of the URLs and not all instances of the search term on each page are given in these short extracts. It may be the case that the search term occurs many times on a given page, but a Google-user could not know this without clicking on each of the links manually. Google is an excellent search engine but it is not designed as a corpus linguistics tool and is not ideal for this purpose. WebCorp contains options (customisable concordance span, output format, etc) specifically designed for linguistic research.

Why is WebCorp slow to return results?

The reason that WebCorp is slower than search engines is that, although WebCorp has a search engine-like interface, its aims and the way it works are very different. In order to conduct a full linguistic analysis of how a particular word or phrase is used on the Web, the alternative to using WebCorp would be to use a search engine to find a list of pages containing the word or phrase, and then to access each of the URLs in this list manually, locate each of the examples of the word/phrase on the page and copy these into a file. WebCorp automates this whole process, which is why it is slower than a standard search engine. It is still a vast time-saver over the equivalent manual process.