WebCorp Advanced Wordlist Generator Guide Publications   Feedback
 
Background
How does it work?
 Basic Options
Pattern Matching
Advanced Options: Format
Advanced Options: Concordances
Advanced Options: Domains
Advanced Options: Word Filter
Advanced Options: Date Filter
Advanced Options: Collocation
Advanced Options: Hypertext
Post-Processing
Other Tools

 
Advanced Options: Hypertext

WebCorp automatically filters out text markup, script languages, metatags, 'alt' text on images, etc. Two options are provided on the Advanced Interface which allow additional filtering:

Hypertext Options
WebCorp Hypertext Options

  • Exclude link text: This option can be used to exclude hyperlink text from the concordances produced, i.e. any text appearing between <A HREF="http://somesite.com"> and </A>. This can be useful for excluding navigation menus, but in some cases hyperlink text is vital to the meaning of a sentence so this option is not enabled by default.

  • Exclude wildcard match to e-mail address: This option can be used to prevent email addresses being matched by any wildcards included in your search term.

Next: Post-Processing >>

 

 © 1999-2008 Research and Development Unit for English Studies   Privacy Policy