doc_crawler : download all the PDF of a website

Télécharger tous les fichiers PDF d’un site web

$ doc_crawler.py http://a.com > url.lst (1)
$ doc_crawler.py --download-files url.lst (2)
$ doc_crawler.py --download-file http://a.com/file.txt (3)
$ doc_crawler.py --accept=jpe?g$ --download --single-page --wait=3 https://a.com/a_page (4)

1	Create a `url.lst` file that contains the links to all PDF, ODT, DOC, ZIP… files found while exploring recursively the pointed website
2	Download all the listed files (in a second time, after you checked that the list is good)
3	Download one by one the files that failed for instance
4	Download all the photos from a unique page web gallery, all found photos are directly downloaded on the fly, the random wait between two requests is never longer than 3 seconds

doc_crawler.py can be downloaded here, or installed via Pypi.

Grimoire-
Command
.es

doc_crawler : download all the PDF of a website

Selection

Themes

Sponsor

Grimoire-Command.es

doc_crawler : download all the PDF of a website

Selection

Themes

Sponsor

Grimoire-
Command
.es