Our new large-scale web search engine. A tailored linguistic search engine for accessing the web as corpus.
We are currently developing a fully-tailored linguistic search engine which will result in significant improvements in speed and an increase in the range of processing options available.
WebCorp Live relies on standard web search engines such as Google and Bing, adding layers of refinement specifically for linguistic analysis. This process is inherently slower than a search of a fully indexed and pre-processed database or corpus and, as WebCorp Live has grown to serve hundreds of thousands of users worldwide, speed has become more of an issue.
WebCorp Linguist's Search Engine is powered by our own search engine, developed at Birmingham City University. Our specially-designed web crawler, parser, tokeniser, indexer and other components allow us to cache and process large sections of the web. The new architecture has allowed us to enhance the sentence boundary detection, date identification, 'junk' (or 'boilerplate') removal, collocation and other statistical analysis options currently available in WebCorp. Additional pre-processing includes grammatical tagging and language detection, and full pattern matching and wildcard search.
The WebCorp Linguist's Search Engine is available here.