Archive

Monthly Archives: October 2017

October 19, 2017

Octoparse Web Scraper Review

I’ve been searching for a tool that will collect data on businesses within a certain market sector.  I need something that is semi automatic, ethical, in continuous development, and with good support.  So I’ve downloaded a copy of Octoparse and I’m now working through the tutorials while writing this Octoparse Web Scraper Review.

If you’ve searched for such tools yourself (or any other software for that matter) you’ve probably spotted that there are many that promise a great deal but which haven’t been updated for a year or more.  There are fake reviews that are blatant affiliate promotion pages and there are one or two that look promising.

It’s time consuming finding the right product and you don’t know who to trust.

I should therefore be transparent and state that I’m writing this review because the team behind Octoparse are offering a free month of Pro subscription to anyone who writes a review.  They didn’t say the review had to be a five star endorsement, just a review, so here’s mine.

My Review of Octoparse Web Scraper

I registered for a free account on their website and downloaded the trial copy.  As soon as this was installed there was a notification of an update so the first thing I did was to update it to the latest copy.  I find it a little reassuring that there’s a recent patch or upgrade.

The UI is pleasantly free from distractions and advertising.  Like all new and unfamiliar pieces of software it takes a few minutes to navigate about but there are handy built-in tutorials or you can dive straight in and start a task.

There’s a wizard option to build tasks or you can go in advanced mode. I chose the wizard option as this is all new to me.

I set up the task and entered the URL that contained a list of companies with links to pages giving more details of each. I wanted to collect all the contact information for each company.

By clicking on example fields in the internal browser window the software quickly learns what to collect.  The only niggle I had was capturing the company’s URL.  Perhaps that’s my inexperience or a limitation in the trial copy.

Anyway, within a few minutes I had captured about 50 records and exported the data into an Excel spreadsheet.

Obviously this being the first attempt this took a little while but already I can see how the Octoparse Web Scraper will save me a lot of time.  I need information from several pages of a directory.  It’s not a lot but it’s a laborious processes to do it manually and consequently I tend to procrastinate or find other things to do.

This tool will save a lot of that time and effort, and I’ll be able to get on with using the information it gathers.

Ethics of Web Scraping

It’s easy to stray off the straight and narrow path when using tools like this that gather information.  If someone has gone to all the trouble of building a web directory there is the potential for someone to scrape all the data and build a duplicate directory under another name.  That of course would plagiarism and unethical, not to mention illegal.

However, data is published in directories so that others can find it and if you need to contact many companies within the same market sector then there’s no harm in saving some time and effort by gathering that data and putting it into a spreadsheet for future reference.

So my advice is:

  • Don’t over do it. Take only what you need.
  • Don’t ever re-use the data and pass it off as your own.
  • Give something back to the source e.g. create a quality backlinks, use their adverts, buy something from their site, recommend them in appropriate forums and in social media.