jump to navigation

How to download all links from a webpage – Including hidden on Centos November 5, 2010

Posted by Tournas Dimitrios in Linux.
trackback

If you want to dump all links in a page to a text file, including hidden ones, you can use “lynx”.This may be useful for debugging, or to bookmark all links in a webpage of your interest.Also if you want to download all specific files in a webpage, lets say all mp3 or jpg files in a index page. On Centos 5.5 it’s not installed by defauld , sot use “yum install lunx” first .

Well, lets see it in action :

As you probably know this blog is achieved by categories , and each categories is indexed separately . So lets download all links from my actionscript category index .

# lynx -dump https://tournasdimitrios1.wordpress.com/articles-index/actionscript/ | grep 2010

The result is :

 CAPTION: November 2010
          + 6,120 visitors since May-2010
  14. https://tournasdimitrios1.wordpress.com/2010/09/24/understanding-how-as3-manages-depths/
  15. https://tournasdimitrios1.wordpress.com/2010/09/22/try-catch-finally-statement-in-actionscript-3/
  16. https://tournasdimitrios1.wordpress.com/2010/09/22/as3-loops-while-vs-for-vs-for-each/
  17. https://tournasdimitrios1.wordpress.com/2010/09/22/the-basics-of-using-css-in-as3/
  18. https://tournasdimitrios1.wordpress.com/2010/09/22/using-the-switch-conditional-in-actionscript-3-0/
  19. https://tournasdimitrios1.wordpress.com/2010/09/22/understanding-as3-super-statement-in-actionscrip-3/
  20. https://tournasdimitrios1.wordpress.com/2010/09/22/the-with-statement-in-actionscript-3/
  21. https://tournasdimitrios1.wordpress.com/2010/09/21/object-oriented-actionscript-3-using-php-sql/
  22. https://tournasdimitrios1.wordpress.com/2010/09/21/sendreceive-variables-from-actionscript-3-php-an-oop-approach/
  23. https://tournasdimitrios1.wordpress.com/2010/09/19/tutorial-actionscript3-%E2%80%93-loading-sound-and-play-as-loop/
  24. https://tournasdimitrios1.wordpress.com/2010/09/18/quick-and-easy-geturl-class/
  25. https://tournasdimitrios1.wordpress.com/2010/09/09/building-a-frame-rate-counter-in-actionscript-3-0/
  26. https://tournasdimitrios1.wordpress.com/2010/09/02/prng-a-seed-based-pseudorandom-number-generator-in-actionscript/
  27. https://tournasdimitrios1.wordpress.com/2010/08/30/simple-example-of-randomizing-the-order-of-an-array-in-actionscript/
  28. https://tournasdimitrios1.wordpress.com/2010/08/24/open-source-tweening-engines-for-actionscript/
  29. https://tournasdimitrios1.wordpress.com/2010/08/14/using-the-transitionmanager-class-in-as3-2/
  30. https://tournasdimitrios1.wordpress.com/2010/08/15/using-caurina-tweener-in-as3-to-move-objects/
  31. https://tournasdimitrios1.wordpress.com/2010/08/01/using-tweenlite-with-flash-cs3-the-basics/
  32. https://tournasdimitrios1.wordpress.com/2010/08/14/creating-animations-using-the-as3-tween-class/
  33. https://tournasdimitrios1.wordpress.com/2010/08/14/timing-an-animation-in-actionscript-3/
  34. https://tournasdimitrios1.wordpress.com/2010/08/14/using-the-date-class-in-actionscript-3-0/
  35. https://tournasdimitrios1.wordpress.com/2010/08/13/using-the-timer-class-in-actionscript-3-0/
  36. https://tournasdimitrios1.wordpress.com/articles-index/2010/08/07/working-with-xml-e4x-and-actionscript-3/
  37. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/action-script-error-repository/
  38. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/security-error-accessing-flash-content-from-other-domain/
  39. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/how-to-create-a-crossdomain-xml-file/
  40. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/applying-rot128-encryption-on-bytearray/
  41. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/embedding-binary-xml/
  42. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/saving-xml-as-binary/
  43. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/23/the-abcs-of-amf-format/
  44. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/11/the-singleton-pattern/
  45. https://tournasdimitrios1.wordpress.com/articles-index/2010/07/03/the-parentheses-operator-in-actionscript/
  46. https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/free-api-posters-for-as3-and-flex-sdk/
  47. https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/30-more-awesome-open-source-as3-libraries/
  48. https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/30-%e2%80%9cmust-try%e2%80%9d-open-source-actionscript-3-libraries/
  50. https://tournasdimitrios1.wordpress.com/2010/10/
  51. https://tournasdimitrios1.wordpress.com/2010/11/02/
  52. https://tournasdimitrios1.wordpress.com/2010/11/03/
  53. https://tournasdimitrios1.wordpress.com/2010/11/04/
  54. https://tournasdimitrios1.wordpress.com/2010/11/05/

If you want the same output without the numbering line, you can use :

lynx –dump https://tournasdimitrios1.wordpress.com/articles-index/actionscript/  | awk ‘/http/{print $2}’   |   grep 2010   > mylinks.txt

And the result is :

https://tournasdimitrios1.wordpress.com/2010/09/24/understanding-how-as3-manages-depths/
https://tournasdimitrios1.wordpress.com/2010/09/22/try-catch-finally-statement-in-actionscript-3/
https://tournasdimitrios1.wordpress.com/2010/09/22/as3-loops-while-vs-for-vs-for-each/
https://tournasdimitrios1.wordpress.com/2010/09/22/the-basics-of-using-css-in-as3/
https://tournasdimitrios1.wordpress.com/2010/09/22/using-the-switch-conditional-in-actionscript-3-0/
https://tournasdimitrios1.wordpress.com/2010/09/22/understanding-as3-super-statement-in-actionscrip-3/
https://tournasdimitrios1.wordpress.com/2010/09/22/the-with-statement-in-actionscript-3/
https://tournasdimitrios1.wordpress.com/2010/09/21/object-oriented-actionscript-3-using-php-sql/
https://tournasdimitrios1.wordpress.com/2010/09/21/sendreceive-variables-from-actionscript-3-php-an-oop-approach/
https://tournasdimitrios1.wordpress.com/2010/09/19/tutorial-actionscript3-%E2%80%93-loading-sound-and-play-as-loop/
https://tournasdimitrios1.wordpress.com/2010/09/18/quick-and-easy-geturl-class/
https://tournasdimitrios1.wordpress.com/2010/09/09/building-a-frame-rate-counter-in-actionscript-3-0/
https://tournasdimitrios1.wordpress.com/2010/09/02/prng-a-seed-based-pseudorandom-number-generator-in-actionscript/
https://tournasdimitrios1.wordpress.com/2010/08/30/simple-example-of-randomizing-the-order-of-an-array-in-actionscript/
https://tournasdimitrios1.wordpress.com/2010/08/24/open-source-tweening-engines-for-actionscript/
https://tournasdimitrios1.wordpress.com/2010/08/14/using-the-transitionmanager-class-in-as3-2/
https://tournasdimitrios1.wordpress.com/2010/08/15/using-caurina-tweener-in-as3-to-move-objects/
https://tournasdimitrios1.wordpress.com/2010/08/01/using-tweenlite-with-flash-cs3-the-basics/
https://tournasdimitrios1.wordpress.com/2010/08/14/creating-animations-using-the-as3-tween-class/
https://tournasdimitrios1.wordpress.com/2010/08/14/timing-an-animation-in-actionscript-3/
https://tournasdimitrios1.wordpress.com/2010/08/14/using-the-date-class-in-actionscript-3-0/
https://tournasdimitrios1.wordpress.com/2010/08/13/using-the-timer-class-in-actionscript-3-0/
https://tournasdimitrios1.wordpress.com/articles-index/2010/08/07/working-with-xml-e4x-and-actionscript-3/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/action-script-error-repository/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/security-error-accessing-flash-content-from-other-domain/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/30/how-to-create-a-crossdomain-xml-file/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/applying-rot128-encryption-on-bytearray/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/embedding-binary-xml/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/28/saving-xml-as-binary/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/23/the-abcs-of-amf-format/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/11/the-singleton-pattern/
https://tournasdimitrios1.wordpress.com/articles-index/2010/07/03/the-parentheses-operator-in-actionscript/
https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/free-api-posters-for-as3-and-flex-sdk/
https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/30-more-awesome-open-source-as3-libraries/
https://tournasdimitrios1.wordpress.com/articles-index/2010/06/04/30-%e2%80%9cmust-try%e2%80%9d-open-source-actionscript-3-libraries/
https://tournasdimitrios1.wordpress.com/2010/10/
https://tournasdimitrios1.wordpress.com/2010/11/02/
https://tournasdimitrios1.wordpress.com/2010/11/03/
https://tournasdimitrios1.wordpress.com/2010/11/04/
https://tournasdimitrios1.wordpress.com/2010/11/05/

If you want to download any specific type of file, just grep for that file on the  output, and then use a bash for loop and wget to download them.

It could be something like this:

lynx –dump http://somesite.com/page.html | awk ‘/http/{print $2}’ | grep jpg > /tmp/file.txt  > mylinks.txt

Then use a simple for loop to download the files.

for i in $( cat /tmp/file.txt ); do wget $i; done

This is just an example of some uses this may have, there are other ways to download specific files from a site, maybe easier than this one.

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s