Download website text only
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Download website text only
I am looking to download the text ONLY of a specific web site - no MP3 files, no images, and wonder what people are using in the year 2020.
I gave www.httrack.com a quick spin, but it seemed to whir away for an hour and then paused, restarting when I clicked somewhere.
I am looking for just a quick-grab of the text. I will be analyzing it later, so i will have access to my own text-cleanup tools.
Thanks
Chris
I gave www.httrack.com a quick spin, but it seemed to whir away for an hour and then paused, restarting when I clicked somewhere.
I am looking for just a quick-grab of the text. I will be analyzing it later, so i will have access to my own text-cleanup tools.
Thanks
Chris
He who plants a seed, plants life.
-
- 3StarLounger
- Posts: 327
- Joined: 25 Jan 2010, 17:36
Re: Download website text only
I think you'll find Nuke Anything what you're looking for:
https://download.cnet.com/Nuke-Anything ... 48044.html
https://download.cnet.com/Nuke-Anything ... 48044.html
-
- 3StarLounger
- Posts: 205
- Joined: 02 Feb 2010, 23:58
Re: Download website text only
If by any chance you are using chrome browser or the new Edge browser.
There is an chrome extension that also works with the new Edge browser called Reader View.
There is an option to print, save, change font size and even background shade, etc. There is a toggle option to hide/show images as well.
Firefox has a similar extension but kinda limited in my opinion.
Hope this helps.
There is an chrome extension that also works with the new Edge browser called Reader View.
There is an option to print, save, change font size and even background shade, etc. There is a toggle option to hide/show images as well.
Firefox has a similar extension but kinda limited in my opinion.
Hope this helps.
-
- UraniumLounger
- Posts: 9300
- Joined: 13 Feb 2010, 01:27
- Location: Deep in the Heart of Texas
Re: Download website text only
I followed the link to CNET without problems but on the CNET page there is no option to download the software but there is a link to the software publisher's page that took me to a 404 page.curious wrote: ↑16 Sep 2020, 23:01I think you'll find Nuke Anything what you're looking for:
https://download.cnet.com/Nuke-Anything ... 48044.html
Bob's yer Uncle
Dell Intel Core i5 Laptop, 3570K,1.60 GHz, 8 GB RAM, Windows 11 64-bit, LibreOffice,and other bits and bobs
(1/2)(1+√5) |
-
- Administrator
- Posts: 78678
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Download website text only
If you're using Chrome or Edge: Nuke Anything Enhanced
If you're using Firefox: Nuke Anything Enhanced
Warning: the extension hasn't been updated since 2017, so it might not be compatible with the current browser version.
If you're using Firefox: Nuke Anything Enhanced
Warning: the extension hasn't been updated since 2017, so it might not be compatible with the current browser version.
Best wishes,
Hans
Hans
-
- UraniumLounger
- Posts: 9300
- Joined: 13 Feb 2010, 01:27
- Location: Deep in the Heart of Texas
Re: Download website text only
Thanks, Hans!
Bob's yer Uncle
Dell Intel Core i5 Laptop, 3570K,1.60 GHz, 8 GB RAM, Windows 11 64-bit, LibreOffice,and other bits and bobs
(1/2)(1+√5) |
-
- 3StarLounger
- Posts: 327
- Joined: 25 Jan 2010, 17:36
Re: Download website text only
BobH -
Sorry you had that problem. Hans' suggestion is most likely an updated version, so I urge you to try that.
Sorry you had that problem. Hans' suggestion is most likely an updated version, so I urge you to try that.
-
- BronzeLounger
- Posts: 1243
- Joined: 25 Jan 2010, 22:25
- Location: Pickering, Ontario, Canada
Re: Download website text only
I have often thought that there must be some tool to filter out unwanted material. Thanks Hans ... and I did see your warning.HansV wrote: ↑17 Sep 2020, 18:58If you're using Chrome or Edge: Nuke Anything Enhanced
If you're using Firefox: Nuke Anything Enhanced
Warning: the extension hasn't been updated since 2017, so it might not be compatible with the current browser version.
Regards,
Bob
Bob
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Download website text only
Hi jolas, and my apologies for the delay.
This is close to what i want, but I do not want to hide the non-text.
I want the site-downloader to filter the non-text automatically.
So far the applications I've tried either
(1) download the entire site, then i must write a post-processor to filter out non-text or
(2) provide switches which in my stumbling way I can't get to work.
Cheers
Chris
He who plants a seed, plants life.
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Download website text only
Thanks Hans.
I believe I expressed myself poorly in the original post.
I'm looking for an application that will download an entire web site (or at least, will download everything at or below the url) AND will strip out all non-text material as it goes.
Think "off-line analysis of user comments" as an example.
Cheers
Chris
He who plants a seed, plants life.
-
- 3StarLounger
- Posts: 205
- Joined: 02 Feb 2010, 23:58
Re: Download website text only
You maybe aware of this already. Chrome, the new Edge Browser as well as Firefox do have a Save As /Save Page As when you right-click the inside the webpage.
For Chrome and Edge there are three options and save as HTML only would have placeholders for non-text object. Probably this is useful for simple structured webpages.
For Firefox, interestingly it has a Save as Text Files option aside from Web page, complete and Web Page, HTML only.
For Chrome and Edge there are three options and save as HTML only would have placeholders for non-text object. Probably this is useful for simple structured webpages.
For Firefox, interestingly it has a Save as Text Files option aside from Web page, complete and Web Page, HTML only.
You do not have the required permissions to view the files attached to this post.
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Download website text only
Hi Jolas.
I'm not looking to save "a page".
I'm looking for an application that will download an entire web site.
If Chrome/Firefox/Mosaic browsers can let me point to a site with a URL, and with one click download every page on that site, then they would be a candidate application.
I do not want to save 1,000 pages, one click at a time!
Cheers
Chris
He who plants a seed, plants life.
-
- Administrator
- Posts: 78678
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: Download website text only
Imagine downloading Eileen's Lounge, with tens of forums, each with many pages; more than 33000 topics, many of which have more than one page; the member list with more than 150 pages; plus all the other pages. An application would have to know the structure of the Lounge to do this in any meaningful way, and if it did, it would probably cause the Lounge to crash. So don't do it!
Best wishes,
Hans
Hans
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Download website text only
Oh, I won't, and it's not on EL anyway.
I used to d/l the 250+ posts from Canada NewsWire six days a week, when it was text-based (some 10-15 years ago), and then parse/extract all press releases for the Toronto/Mississauga area looking for direct lines to CEOs of large financial/pharmaceutical firms.
Prior to that I was commissioned to scour the Toronto Police blotters (again text-based) to provide an Alert service based on postal codes.
Also there was a commission to access Google Financials (and two other sites) for fiscal data on publicly traded companies for a guy who had a foolproof formula for working out which shares to buy. Far as I know he's still living in Toronto; Loser!
This new laptop is not very strong, so any attempt by me to d/l a massive web site will be doomed before I click "OK".
If I were dissecting the Toronto Police Blotter or Canada Newswire today I'd have to cope with all sorts of crud - Links to a/v files, images etc, none of which help in determining a postal code from a few street names, once you've found the street names!
The closest I got to downloading Google was an analyser that would issue a Google Search based on a Canadain Postal Code ("L4X 2G6") and grab the one page of about one hundred hits, parse each hit, and obtain a good directory of that block of the street("L4X 2G6"), the street("L4X 2G"), or the area ("L4X 2").
Cheers
Chris
He who plants a seed, plants life.
-
- Administrator
- Posts: 12635
- Joined: 16 Jan 2010, 15:49
- Location: London, Europe
Re: Download website text only
There are lots of tools that do this, they are typically used by search engines to extract the data they want to index.
Try using your favourite web search engine to search for "Web crawler"
Try using your favourite web search engine to search for "Web crawler"
StuartR
-
- PlutoniumLounger
- Posts: 15669
- Joined: 24 Jan 2010, 23:23
- Location: brings.slot.perky
Re: Download website text only
Thank you Stuart.
I have downloaded three to try:-
Code: Select all
setup-cyowcopy-1.8.0-build-652
getleft-setup-v1.2-full
httrack-3.49.2
Chris
He who plants a seed, plants life.