NetScanner
Introduction Features <<Protocols and Diagnostics>> Download NetScanner handles the following document types and links.
Absolute links Document (http/file) Relative links ftp links
NetScanner accpets remote as well as local documents for analysis. The document may be any URL of http or file types. Examples are
- http://www.geocities.com/SiliconValley/Vista/6222/: The default HTML file for this URL (which is set by the server, in this case www.geocities.com) will be loaded by NetScanner.
- http://www.geocities.com/SiliconValley/Vista/6222/links.html: The document may also point to a HTML file on the internet.
- file:/users/some_user/HomePage/index.html (or equaivalent in Windows 95/NT): In this case, the specified file is loaded from the localhost (the user's system).
In (2) and (3) above, anchors and search strings or redirections may also be specified.
Refer to A Guide to URLs for a comprehensive description of URLs.
Absolute Links
Absolute http links are checked by opening a network connection to the web server specified in the link URL. NetScanner does not actually download the file, but only checks for a response from the server. The response codes are specified in RFC 1945. A HTML version of this RFC may be found at http://www.ics.uci.edu/pub/ietf/http/rfc1945.html. For a brief description, see HTTP response messages.
Relative Links
A relative link such as <a href="welcome.html">Welcome</a> does not make a complete URL, and the file's location (i.e., the one that contains the link) is used to resolve it into an absolute link. For example, if a link <a href="welcome.html">Welcome</a> is used in a document <a href="http://www.welcome.org/index.html">Welcome</a>, the location http://www.welcome.org/ is used to resolve the complete URL.
FTP Links
For checking an ftp link, NetScanner first attempts to make an anonymous login with a dummy password (by default). If this is successful, and if the link specifies a filepath, NetScanner checks if the filepath exits on the ftp server.
For a description of the diagnostic response messages, see FTP response messages.
Anchors
Anchors in URLs allow the browsers to point to a specific anchored (or named) part of a document.
As per the HTTP, anchors are not technically a part of the URL specification. It is usually the client's (e.g., a browser) responsibility to resolve anchors. This implies that even if the anchor specified in a link is not found (but the file supposed to contain the anchor is valid), the HTTP protocol does not specify any response code and reason-phrase. Therefore NetScanner first checks if the file specified in the link is valid. If so, NetScanner proceeds to download the complete file, and searches for a HTML tag that defines the anchor. If the anchor is not found, NetScanner responds with a "Anchor not found" message.
Search Strings
These are part of the standard URL description and will be automatically resolved.
HTTP Messages
The following is a brief description of response messages for HTTP links. These are extracted from RFC 1945. As per the above RFC, the server responds by a three digit response code followed by a reason-phrase. The response code are categorized as follows based on the first digit of the status code:
1XX Informational - Not used, but reserved for future use 2XX Success - The action was successfully received, understood, and accepted. 3XX Redirection - Further action must be taken in order to complete the request 4XX Client Error - The request contains bad syntax or cannot be fulfilled 5XX Server Error - The server failed to fulfill an apparently valid request Some examples of response code and reason-phrase are given below.
200 OK 201 Created 202 Accepted 204 No Content 301 Moved Permanently 302 Moved Temporarily 304 Not Modified 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable Whenever a response code and reason-phrase could be obtained for a link, NetScanner displays the reason-phrase against the link. If the response code is not of class 2XX, NetScanner in addition displays an error message in the Java Console, as such responses indicate bad links. As long as the response code confirms 2XX, even if the reason-phrase is not one of the above (under 2XX), the URL link should be considered to be valid.
See the miscellaneous errors section for other messages NetScanner generators. This is usually the case when a valid response could not be obtained from the server.
FTP Messages
- Login failed: Anonymous (or other, if specified) login failed.
- File not found: The spcified directory/file is not found on the ftp server.
Miscellaneous Errors
- Host unknown: The hostname specified in the URL is unknown.
- Protocol unknown: The protocol specified in the URL is unknown.
- Error while connecting: The host could not be reached due to network errors.
- Unknown Error: Internal error.
- Unchecked: This is the default status message when a document is first loaded. This message, after a scan, indicates that a valid connection to the URL could not be established within the specified seconds.
Last modified: Fri Jun 18 11:47:52 India Standard Time 1999
© Subbu Allamaraju 1998, 1999. All rights reserved.
All copyrights and trademarks acknowledged.