The Harvester

Next: The Parser Up: Modules Previous: Modules Contents

The Harvester

This unit is very similar to the regular harvesting unit simply gathering material, which is sufficiently defined by given URLs. There is a deeper dimension to its job, however, as the Harvester should not only be able to download regular, static documents, but also to forward queries to a web-server. That is, the harvesting module must be able to support, for example, the standard GET and POST operations. This is necessary in order to retrieve dynamic pages and will be needed at a later stage.

Additionally, the Harvester tries to compile some meta-information it retrieves from communicating with the server. An important information for the further handling of the request could be, for example, the type of the document. Assuming it is a picture, the following steps could be skipped, as there will hardly be interactivity in it.

Andreas Aschenbrenner