Support

If you have a question that isn't answered here, then please contact us.

What is WebExtract?
Does this program contain ANY malware?
System requirements
Good to know about Windows XP SP2
How do I start it?
How do I use it?
How do I save the emails?
What is a keyword?
How do I get started as quickly as possible?
How can I find sites and emails from specific countries?
How can I find emails from a specific domain?
How can I find each email on a specific site?
Why should I purchase the full version?
Diskcrawl
URL list
Connection Settings
Expert
Email parsing
Proxies
Deep crawling
Status
It won't start, it says I'm missing some file?
No data is being downloaded?
I have a question, how do I reach you?


What is WebExtract?
[Top]
WebExtract is a very powerful email extractor which can find a variety of different email addresses from websites on the Internet. It can find from specific countries, domains and areas of interest. It's simply one of the most powerful tools out there. Look on the main page for the entire list of features found in WebExtract.


Does this program contain ANY malware
? [Top]
WebExtract is, and always will be 100% free of all kinds of harmful code such as spyware, adware and viruses. If you don't like the program you can always uninstall it without risking any kind of harm to your system.

System requirements
[Top]
Windows 95/98/2000/2003 Server/ME/XP/Vista/7
256 MB of RAM (512 recommended)
Internet connection


Good to know about Windows XP SP2 [Top]
Users running Windows XP with Service Pack 2 installed have been limited to only using 10 connections by Microsoft. This limit is easily to remove by running an unofficial patch found at lvllord.de Remember that this is a third-party patch and it's only recommended for experienced computerusers.

How do I start it?
[Top]
After you have installed it, open your start-menu and click on 'Programs', under that tab you will see a group called 'WebExtract Shareware'.


How do I use it?
[Top]
WebExtract basically have two different modes, keyword extraction and URL extraction. The different methods of extraction only defines how websites are found. Under the first tab in the program ("Keywords") you can enter different phrases and words that you want the program to use when querying the search engine. The sites found will be relevant to the words and phrases added under the "Keywords" tab.

The other mode is under the tab "URL extraction" where you can enter URL's manually, those entered in the list will be spidered for emails and no search engines will be used. Remember that if you want to use the URL extraction mode you need to enable it under that tab, "Work in URL list mode".

Those are basically the two different ways WebExtract can find emails.
Read on to see explanations of the other tabs and functions in the program.

How do I save the emails?
[Top]
When you click on "Start" you will be asked if you are ready to begin, press "Yes". The next window that now appears is where you set where the emails will be saved to. Emails will by default be saved automatically when more than 100,000 have be found. This can be changed by setting a different number under the field "Autosave emails when more than this number have been found" under the tab "Expert". The emails will also be saved when you click on "Stop" or when the search has been completed. At the end of the session WebExtract will also show you where the emails have been saved to in case you forgot.

What is a keyword?
[Top]
A keyword is an essential part of the program, they are used to query the search engine in order to find sites to harvest for emails.

This is how it works.

1. You add the keyword "cats" in the keyword list and click on "Start".
2. WebExtract goes to the search engine and makes a search using the keyword.
3. The search engine processes the keyword and returns the matching sites.
4. WebExtract starts visiting the sites returned by the search engine in order to find emails.
5. When the program is running low on sites it takes the next keyword in the list and starts from step 2.
6. When no more sites can be found, the program stops.

That's how it works.

How do I get started as quickly as possible?
[Top]
First of all you will need some keywords that will be used with the search engine to get sites to harvest. For now, click on the tab "Keywords" and then click on the "Import test keywords" button to import some common keywords that you can test with. Now click on the "Connection settings" tab and define the number of Harvesting connections that you would like to use. You should use about 50 if your connection is a slow or medium fast DSL or Cable connection. About 100-200 if you have a good connection, fast DSL, Cable or a T1. You should use 200 or above if your connection is faster than a T1. If you are unsure about this setting, set it to 100.

You are now ready to begin harvesting (we will get in to the other settings in the next section).
Click on start. It will ask you if you are ready to begin harvesting, click on "Yes". You will now need to select the path where all the accepted emails will be saved to. Define a path or let the default path be as it is, your choice. Then click on "Done".

It will now begin to query the search engine with the keywords that you have defined, it will get as many sites as defined in the connection settings, then it will harvest them for emails. There is no need to do anything more now, it will work until it runs out of keywords and sites. Good luck!

How can I find sites and emails from specific countries?
[Top]
If you want to use your keywords to find sites and emails from for example Australia then do the following:

The easiest way is to press F5 to bring up the "Quick Start Wizard" or open it from the "File" menu. It will guide you through all the necessary steps.

If you want to do it manually, follow the steps below:

  1. Go to the tab "Keywords" and add some keywords which you would like to use.
  2. Under the tab "Expert" enter this in the field called "Custom query string": &as_sitesearch=.au
  3. Make sure that "Work in URL list mode" under the tab "URL list" is disabled.
  4. So far you have made the settings to only get sites from Australia from the searchengine. If you also want to make sure that all the emails that you find are truly australian, then do the following, otherwise skip to step 5: Under the tab "Email parsing" select the option "Only accept emails with these domain extensions" and enter the following in the box below that option: .au
  5. You're all set, click on "Start" to begin.

If you want to change the country then just replace .au in step 2 and 4 with any other country code which you can find in the list below. Like .fi for Finland or .eg for Egypt or .ar for Argentina and so on.

If this didn't work then click on the button "Reset" and wait five seconds to return all the settings to normal. Then start from step 1 again. If it still doesn't work then contact the support.

Here are all the country code top level domains for each country in the world that has internet access.

.ac – Ascension Island
.ad – Andorra
.ae – United Arab Emirates
.af – Afghanistan
.ag – Antigua and Barbuda
.ai – Anguilla
.al – Albania
.am – Armenia
.an – Netherlands Antilles
.ao – Angola
.aq – Antarctica
.ar – Argentina
.as – American Samoa
.at – Austria
.au – Australia
.aw – Aruba
.ax – Aland Islands
.az – Azerbaijan
.ba – Bosnia and Herzegovina
.bb – Barbados
.bd – Bangladesh
.be – Belgium
.bf – Burkina Faso
.bg – Bulgaria
.bh – Bahrain
.bi – Burundi
.bj – Benin
.bm – Bermuda
.bn – Brunei Darussalam
.bo – Bolivia
.br – Brazil
.bs – Bahamas
.bt – Bhutan
.bv – Bouvet Island
.bw – Botswana
.by – Belarus
.bz – Belize
.ca – Canada
.cc – Cocos (Keeling) Islands
.cd – Congo, The Democratic Republic of the
.cf – Central African Republic
.cg – Congo, Republic of
.ch – Switzerland
.ci – Cote d'Ivoire
.ck – Cook Islands
.cl – Chile
.cm – Cameroon
.cn – China
.co – Colombia
.cr – Costa Rica
.cu – Cuba
.cv – Cape Verde
.cx – Christmas Island
.cy – Cyprus
.cz – Czech Republic
.de – Germany
.dj – Djibouti
.dk – Denmark
.dm – Dominica
.do – Dominican Republic
.dz – Algeria
.ec – Ecuador
.ee – Estonia
.eg – Egypt
.eh – Western Sahara
.er – Eritrea
.es – Spain
.et – Ethiopia
.eu – European Union
.fi – Finland
.fj – Fiji
.fk – Falkland Islands (Malvinas)
.fm – Micronesia, Federated States of
.fo – Faroe Islands
.fr – France
.ga – Gabon
.gb – United Kingdom
.gd – Grenada
.ge – Georgia
.gf – French Guiana
.gg – Guernsey
.gh – Ghana
.gi – Gibraltar
.gl – Greenland
.gm – Gambia
.gn – Guinea
.gp – Guadeloupe
.gq – Equatorial Guinea
.gr – Greece
.gs – South Georgia and the South Sandwich Islands
.gt – Guatemala
.gu – Guam
.gw – Guinea-Bissau
.gy – Guyana
.hk – Hong Kong
.hm – Heard and McDonald Islands
.hn – Honduras
.hr – Croatia/Hrvatska
.ht – Haiti
.hu – Hungary
.id – Indonesia
.ie – Ireland
.il – Israel
.im – Isle of Man
.in – India
.io – British Indian Ocean Territory
.iq – Iraq
.ir – Iran, Islamic Republic of
.is – Iceland
.it – Italy
.je – Jersey
.jm – Jamaica
.jo – Jordan
.jp – Japan
.ke – Kenya
.kg – Kyrgyzstan
.kh – Cambodia
.ki – Kiribati
.km – Comoros
.kn – Saint Kitts and Nevis
.kp – Korea, Democratic People's Republic
.kr – Korea, Republic of
.kw – Kuwait
.ky – Cayman Islands
.kz – Kazakhstan
.la – Lao People's Democratic Republic
.lb – Lebanon
.lc – Saint Lucia
.li – Liechtenstein
.lk – Sri Lanka
.lr – Liberia
.ls – Lesotho
.lt – Lithuania
.lu – Luxembourg
.lv – Latvia
.ly – Libyan Arab Jamahiriya
.ma – Morocco
.mc – Monaco
.md – Moldova, Republic of
.me – Montenegro
.mg – Madagascar
.mh – Marshall Islands
.mk – Macedonia, The Former Yugoslav Republic of
.ml – Mali
.mm – Myanmar
.mn – Mongolia
.mo – Macao
.mp – Northern Mariana Islands
.mq – Martinique
.mr – Mauritania
.ms – Montserrat
.mt – Malta
.mu – Mauritius
.mv – Maldives
.mw – Malawi
.mx – Mexico
.my – Malaysia
.mz – Mozambique
.na – Namibia
.nc – New Caledonia
.ne – Niger
.nf – Norfolk Island
.ng – Nigeria
.ni – Nicaragua
.nl – Netherlands
.no – Norway
.np – Nepal
.nr – Nauru
.nu – Niue
.nz – New Zealand
.om – Oman
.pa – Panama
.pe – Peru
.pf – French Polynesia
.pg – Papua New Guinea
.ph – Philippines
.pk – Pakistan
.pl – Poland
.pm – Saint Pierre and Miquelon
.pn – Pitcairn Island
.pr – Puerto Rico
.ps – Palestinian Territory, Occupied
.pt – Portugal
.pw – Palau
.py – Paraguay
.qa – Qatar
.re – Reunion Island
.ro – Romania
.rs – Serbia
.ru – Russian Federation
.rw – Rwanda
.sa – Saudi Arabia
.sb – Solomon Islands
.sc – Seychelles
.sd – Sudan
.se – Sweden
.sg – Singapore
.sh – Saint Helena
.si – Slovenia
.sj – Svalbard and Jan Mayen Islands
.sk – Slovak Republic
.sl – Sierra Leone
.sm – San Marino
.sn – Senegal
.so – Somalia
.sr – Suriname
.st – Sao Tome and Principe
.su – Soviet Union (being phased out)
.sv – El Salvador
.sy – Syrian Arab Republic
.sz – Swaziland
.tc – Turks and Caicos Islands
.td – Chad
.tf – French Southern Territories
.tg – Togo
.th – Thailand
.tj – Tajikistan
.tk – Tokelau
.tl – Timor-Leste
.tm – Turkmenistan
.tn – Tunisia
.to – Tonga
.tp – East Timor
.tr – Turkey
.tt – Trinidad and Tobago
.tv – Tuvalu
.tw – Taiwan
.tz – Tanzania
.ua – Ukraine
.ug – Uganda
.uk – United Kingdom
.um – United States Minor Outlying Islands
.us – United States
.uy – Uruguay
.uz – Uzbekistan
.va – Holy See (Vatican City State)
.vc – Saint Vincent and the Grenadines
.ve – Venezuela
.vg – Virgin Islands, British
.vi – Virgin Islands, U.S.
.vn – Vietnam
.vu – Vanuatu
.wf – Wallis and Futuna Islands
.ws – Samoa
.ye – Yemen
.yt – Mayotte
.yu – Yugoslavia
.za – South Africa
.zm – Zambia
.zw – Zimbabwe


How can I find emails from a specific domain?
[Top]
The easiest way is to press F5 to bring up the "Quick Start Wizard" or open it from the "File" menu. It will guide you through all the necessary steps.

If you want to do it manually, follow the steps below:

This can be divided into two different sections, the first being if you want to find emails that are on specific domain(s) of your choice.
For example, if you chose the domain tripod.com and the program finds the site http://hello.tripod.com/contact.html then ALL the valid emails on that URL will be accepted regardless if they contain @tripod.com or not.

The second being if you want to find emails that contain the domain of your choice.
For example, if you chose the domain tripod.com and the program finds the site http://www.anywhere.com/page.html then ONLY the valid emails which contain @tripod.com on page.html will be accepted.

The following examples will use the domain tripod.com as an example, this can of course be changed to any other domain of your choice.

The first method
1. Go to the tab "Keywords" and add some keywords which you would like to use.
2. Under the tab "Expert" enter this in the field called "Custom query string": &as_sitesearch=tripod.com
3. Make sure that "Work in URL list mode" under the tab "URL list" is disabled.
4. Click on "Start" to begin.

The second method
1. Go to the tab "Keywords" and add keywords in the following format: @tripod.com keyword
where keyword is any common word of your choice. This will greatly increase your chances of getting emails containing the string @tripod.com
2. Under the tab "Email parsing" select the option "Only accept emails with these domains" and enter the following in the box below that option: tripod.com
2. Make sure that "Work in URL list mode" under the tab "URL list" is disabled.
3. Click on "Start" to begin.

If this didn't work then click on the button "Reset" and wait five seconds to return all the settings to normal. Then start from step 1 again. If it still doesn't work then contact the support.

How can I find each email on a specific site?
[Top]
The easiest way is to press F5 to bring up the "Quick Start Wizard" or open it from the "File" menu. It will guide you through all the necessary steps.

If you want to do it manually, follow the steps below:

If you want to find emails from for example http://www.nexo-sa.com/asp/news/news.asp and all the other URL's on that site, then do the following:

Go to the tab "URL list" and enable the setting "Work in URL list mode".
Add the URL to the site which the harvesting should begin at, in this case
http://www.nexo-sa.com/asp/news/news.asp
Go to the tab "Connection settings" and change the value "Harvesting connections" to 2. The reason it should be so low is because that is the maximum number of simultaneous connections a normally configured web server allows from one single IP.
Under the same tab, disable the function "Avoid potential spam-traps".
Go to the tab "Email parsing" and disable "Discard emails with these usernames".
Go to the tab "Deep crawling" and enable that function. Also enable the box that says "Only accept URL's that contain these strings" and enter the following in that box: nexo-sa.com
We do this so the deep crawling function only will get URL's from the site nexo-sa.com and no other outside domain.
You are good to go, click on "Start" to begin.
If this didn't work then click on the button "Reset" and wait five seconds to return all the settings to normal. Then start from step 1 again. If it still doesn't work then contact the support.

Why should I purchase the full version?
[Top]
The free Shareware version has some limitations compared to the registered full version. All these limits will be removed when you purchase the full version. For a list of the limitations please see the buy now page.

Diskcrawl
[Top]
This is a relatively new function in WebExtract which allows you to extract email addresses from local files. This can be from any media of your choice such as harddrives, USB-sticks, CD's, DVD's, specific directories or specific files.

Search drives or directories
This function scans a specific path of your choice and uses the "Pattern" box to define what files to include.

Pattern
This field defines what files that are included in the scan. You can use any pattern such as *cat*.txt for finding text files files that contain the word "cat" anywhere in the filename. Or *.* to include every file that exists in the scan. *.xls would scan only Excel documents. There is an endless combination of patterns you can use.

Minimum size
The smallest size a file can have to be included in the scan. Set this field to 0 to disable this field.

Maximum size
The largest size a file can have to be included in the scan. Set this field to 0 to disable this field.

Extract from specific files
This function scans specific files of your choice for email addresses. Use the "Add file" button to populate the list.

URL list
[Top]
If you have a list of URL's you would like WebExtract to spider for emails, enter them here. Make sure you enable "Work in URL list mode" for this function to be enabled. When you are working in URL list mode any keywords in the keyword list will not be used.

Work in URL mode
When you enable this you are entering the URL list mode of WebExtract. Meaning that it will not use any keywords that you have added. It will only spider the URL's in the list for emails.

Generate URL's (X to Y, Step Z)
This function can generate URL's using a logical pattern. It will by default (this can be changed) start at X (1) and go up to Y (100) and it will increase with Z (1) until Y (100) has been reached. Click on the "Generate" button to further explore this function. The number generated will replace the string %NUMBER% in your URL.

Connection Settings
[Top]
Harvesting connections
This is the number of connections that will be used to connect to the websites found by the searchengine or in your URL list. You should use about 50 if your connection is a slow or medium fast DSL or Cable connection. About 100-200 if you have a good connection, fast DSL, Cable or a T1. You should use 200 or above if your connection is faster than a T1. If you are unsure about this setting, set it to 100.

Searchengine connections
This is the number of connections that will be used to connect to the searchengine to get websites. This is disabled by default since the "Emulate human behavior" function under the tab "Expert" is enabled, which in turn disables this. Don't disable that unless you know what you are doing, doing so may set you at risk of getting temporarily blocked by the searchengine.

Timeout (seconds)
This is the number of seconds it takes for a connection to timeout and move on after it has received no response from the server it was connected to. Timeout applies to both the searchengine connections and the harvesting connections. The lower you keep this, the faster it will go through websites, but you will also find less emails. The higher you set it the more emails it will find, but it will work slower. A good suggestion is to keep it in the middle. The default 30 seconds is good.

Sites to find before harvest
This is the minimum number of sites it will accept from the searchengine before starting to harvest, unless you run out of keywords. It works like this, it gets 1000 sites, it harvests them for emails, it gets 1000 sites, it harvests them for emails, it gets 1000 sites... etc. etc. The higher you set it the more memory it will consume. Keep this at 1000 unless you have some specific reason to change it.

Maximum URL's to remember
This is an important setting if you plan on harvesting several million emails. It will prevent that the duplication of emails goes to high and that you don't visit the same site. It keeps this number of the last visited sites in the memory and makes sure that you don't visit any of them again during this session. Note that it might take quite much RAM if you set this too high. You should have at least 256 MB RAM for having it on 50,000 but 512 MB is preferred for speed and overall performance. Don't change this unless you are positive your computer can handle it, otherwise you may experience a poor performance after a few hours.

Save related URL with each email address
If you want to know which site each email was found on, enable this. They will be saved in the format of "site url","email" in the accepted emails file.

Save ALL of the URL's
This will save all URL's WebExtract comes across, regardless if there is an email on it or not. They will be saved in the format of "site url","email" in the accepted emails file.

Keep cache URL formatting
When enabled the program will keep the original google cache formatting URL instead of determining the source URL. They will be saved in the format of "site url","email" in the accepted emails file. This setting will only be used if you have enabled to get sites from the Google cache.

Avoid potential spam-traps
This is checked by default and will make sure that the program doesn't collect any harmful emails. It avoids several different spam-traps and so called honeypots that would cause you a headache in normal cases by possibly getting your server shut down.

Title of sites must contain
With this setting enabled any sites which doesn't contain the strings in the box right of the checkbox will be ignored and not spidered for email addresses. Separate the strings in this field by using the # character.

Accept no more than this number of Bytes from an URL
This setting will make sure that the program doesn't get stuck downloading huge sites. Keeping it at 150,000 bytes is just fine. 150,000 bytes is about 150 kb which is about 0,15 mb. It's up to you what you want to keep this at, if you don't know, don't change it.

Search engine user-agent, Morphing - These 2 settings work together.
The search engine user-agent is how the program identifies to the search engine it communicates with. If you set it to morph it will always generate a different user-agent each time, making it near impossible for anyone except for you to keep track of it, it also prevents it from being possibly blocked based on the user-agent. If you uncheck morph user-agent it will identify as one of the agents you chose in the list of agents, you can also type in your own agent to identify as.

Harvesting user-agent, Morphing - These 2 settings work together.
Same as above, except for that this user-agent setting refers to the sites that the program visits and harvests emails from.


Expert
[Top]
Domain name servers (DNS)
Here you can add the DNS that the program will use to resolve hosts to IP's. This is a very important factor to get a good speed and actually being able to harvest at all. The faster the DNS is the faster you can harvest. On startup your default DNS will be detect and added to the dropdown list. If you wish to not use them, simply click on clear list. Then use the field below to add the one of your choice. Good luck.

Emulate human behavior
This will emulate human behavior when getting sites from the search engine. It will prevent you from getting blocked. When enabling this the program will override your setting for "Searchengine connections" and set it to 1. Between each connection it will pause for X seconds plus a short random interval to increase the odds of being taken for a human. Set the number of seconds between each query in the field to the right of this option.

Autosave emails when more than this number have been found
This is the number of emails to keep in memory before saving them to a file. Each one of the last X emails (where X is the value of this field) contains 0% duplicates. The higher you set it the more RAM it will take and the total of emails in the list will have less duplicated emails. If you get too many duplicate emails when performing a large harvesting session. Try increasing this value, that will make it better. But remember to have a good deal of RAM available if you increase it.

Custom query string
This is a very effective field if you use it right. With it you can add queries the program makes to the search engine just like google adds them in the address bar. Here is an example, if you set the field to: "&as_sitesearch=tripod.com" (without the quotes). Then only sites from the domain tripod.com will be returned in the search results. You can also replace tripod.com with for example .tw to only get sites from Taiwan. If you set the field to "&lr=lang_it" then only sites in Italian will be displayed ("it" being the short for "italian"). You can also combine two settings like this: "&as_sitesearch=.co.uk&lr=lang_en". With that query only sites from the United Kingdom written in English will be returned. You can learn more about what you can add in this field by going to http://www.google.com/advanced_search and testing what the different settings you make change in the address bar. It might be a little complicated at first, but you will soon get the hang of it.

Extend keywords
This will extend the keywords with various extensions to increase the number of emails you find. If you have it on disabled it will not do this at all, it will be slower but the emails will be very targeted. If you have it on Default it will do it a little and give the best return rate of emails, it will be very fast and the emails will be quite targeted. If you have it on high it will do it a lot and will work a long time on each keyword finding quite many emails, but it won't be very fast and the emails won't be especially targeted.

Only get sites from the Google cache (VERY FAST)
Using this option you have the ability to scan the google cache for emails. Since all the google servers are on very fast connections you will never have bad speeds on this one if your keywords are decent. You can retrieve over 2 million emails per hour if you're on a fast connection. It consumes keywords and sites at and impressing rate. Although remember that google will detect this and quickly block your IP for a few hours. This is NOT recommend unless you really know what you are doing.

Do not visit URL's on these domains
When enabled and if you have any domains listed in the field below, then any URL located on either one of the domains entered in the field will be skipped and will not be harvested or even connected to. Useful setting to dodge "bad neighborhoods" on the internet.

Don't show the DNS suggestion window when stopping the program
Self explanatory.

Email parsing
[Top]
Neither
This will make sure that none of the two below mentioned functions "Only accept emails with these domain extensions" and "Discard emails with these domain extensions" are used.

Only accept emails with these domain extensions
Only the domains and domain extension of your choice will be accepted if you select this. You can use this to only accept emails from, for example Spain and China. You would do this by adding this string to the textbox: .es .ch

Discard emails with these domain extensions
Same as the previous function except for that instead of only accepting the specified extensions, any emails that have the extensions will be discarded.

Only accept emails with these domains
Only the domains of your choice will be accepted if you select this. You can use this to only accept from, for example hotmail.com and aol.com. You would do this by adding this string to the textbox: hotmail.com aol.com

Discard emails with these usernames
Discards any email that has any of the usernames found in the textbox.

Discard emails with these strings
Discards any email that contains any of the strings in the textbox.

Discard emails containing more than this number of digits in a row
Self explanatory.

Discard emails that are longer that this number of characters
Self explanatory.

Never extract more than this number of emails from an URL
This makes sure that you never get more than X emails (300 by default) from one single URL. This is there to make sure that someone just hasn't constructed 50,000 bad emails and put them on their site to mess with you.

Proxies [Top]
At the moment WebExtract supports SOCKS v5, HTTP and HTTPS proxies.

Enable proxysupport on all connections
If you enable this the proxies in the list will be used for harvesting and connecting to the search engine. Make sure that you actually have proxies otherwise it won't work.

Enable proxysupport only on search engine connections
If you enable this the proxies in the list will be used for connecting to the searchengine only. Make sure that you actually have proxies otherwise it won't work.

Proxies are separated with
This defines how each proxy in the list is separated. For example, if your list looks this: "231.4.12.1:1080 98.24.11.75:1080 56.123.2.1:1080" that would make each proxy in your list separated with a Blank space. This function only applies if you have chosen to load proxies from an URL or a local file.

Proxy load interval (m)
This defines the interval at which the list of proxies you use is refreshed, remember that the proxies aren't checked for validity, that is up to you to take care of for the moment. If for example this is set to 2 and you have chosen to load proxies from an URL, than every 2 minutes that URL will be queried for the proxies there and the current ones will be removed and these new ones will added. This function only applies if you have chosen to load proxies from an URL or a local file.

Autoload proxies from this URL
Defines the URL from where the list of proxies will be retrieved. Make sure that you enter the correct URL, otherwise it won't work.

Autoload proxies from "proxies.txt" in the application directory
This will load proxies from the file proxies.txt that will be residing in the same directory as the WebExtract executable. Keep this file updated with fresh proxies to keep WebExtract flawlessly working if you are using proxies. This is especially useful if you have a program which is constantly finding fresh proxies for you.


Deep crawling
[Top]
This is a very effective function if you want to go deeper into a site. Keep in mind though that you are less likely to come across emails using this function. If you active deep crawling the program will follow any links on sites it comes across that matches the settings which you define. Then it will attempt to extract more emails and sites from the URL's it finds until the maximum depth level has been reached (depth level and maximum sites to extract from each site can be altered to fit you preference). Enabling this feature will give you the ability to find practically an unlimited amount of sites with as little as one URL as a starting point, or one keyword.

Neither
This will make sure that none of the two below mentioned functions "Ignore URL's that contain these strings" and "Only accept URL's that contain these strings" are used.

Ignore URL's that contain these strings
Doesn't visit any URL that contains any of the strings defined in the textbox.

Only accept URL's that contain these strings
Only visits an URL that contains any of the strings defined in the textbox.

Never go deeper than this number of levels
If you visit one site and the program finds a few links, then you visit one of those links, then you are on level 2 in depth. If you visit yet another url found on one of the links on depth 2, then you will be at depth 3.

Never extract more than this number of URL's per site
This is the maximum amount of URL's that will be extracted from each individual URL the program comes across. It will extract the X first URL's it finds on each URL.


Status [Top]
This will keep you updated with what is happening in the program. How many emails that have been accepted, how many that has been discarded (emails are only discarded based on your email parsing settings). And how many that are invalid, how many accepted emails you collect each hour, how many sites that are left before the searchengine will be queried for more. How much data you have received and how many keywords that are left. When you run out of keywords and sites the search is finished.


It won't start, it says I'm missing some file?
[Top]
If the program says that you're missing an OCX or DLL file then download and install this file from the Microsoft website (restart your computer after you have installed it): Runtime library
If it still won't work then contact us and tell us what file that you are missing.


No data is being downloaded? [Top]
This is a common problem if you are running a firewall or other data controlling software which can be blocking WebExtract.

Make sure that you set your firewall to allow WebExtract full access to the Internet. Also make sure that you have administrator access on the system where WebExtract is running.

If that doesn't work then temporarily shut down your firewall and try again.

If you still have a problem then contact us with as much detailed information as possible and we will help you.


I have a question, how do I reach you? [Top]
You can find our contact information on this page, feel free to contact us with any questions that you might have and we will get back to you as soon as possible. Normally within 24 hours.


buy now | download | description | support | contact us