Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

I think the easiest way is through Ruby. The way to do it depends on
what you're looking for.

If you're just interested in getting pages' content, the simplest way
is through the open-uri functions
(http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.html).

If you want to parse content, there are several options using Ruby
gems, like the following:

* Nokogiri, which is I guess the most popular
(http://railscasts.com/episodes/190-screen-scraping-with-nokogiri)
* Mechanize, which is built on top of Nokogiri
(http://railscasts.com/episodes/191-mechanize)
* Hpricot (https://rubygems.org/gems/hpricot)
* Or just  do a screen scraping with ScrAPI
(https://rubygems.org/gems/scrapi) and the ScrAPI RailsCast
(http://railscasts.com/episodes/173-screen-scraping-with-scrapi).

--Moses

On Thu, Jun 2, 2016 at 7:14 PM, Juergen Pfeffer <[log in to unmask]> wrote:
> ***** To join INSNA, visit http://www.insna.org *****
>
> Not the answer you were hoping for, but instead of playing around with tools
> with limitations, I’d recommend finding a CS undergrad. This can be done in
> 1-2 hours with about 15 lines of Python code.
>
> Best,
>
> Jürgen (former CS undergrad)
>
>
>
>
>
>
>
> From: Jennifer Lawlor
> Sent: Thursday, June 2, 2016 5:59 PM
> To: [log in to unmask]
> Subject: [SOCNET] Web Crawler Recommendations
>
>
>
> ***** To join INSNA, visit http://www.insna.org *****
> Hi all,
>
> I'm working on a project involving hyperlink networks and I'm looking for
> some software tools. Can anyone recommend software for web crawling that
> takes multiple seed links and can output data in a universal format (e.g, a
> .csv)? I'm hoping to avoid writing the code for a crawler from scratch, so
> any advice you can offer about pre-existing software would be really
> helpful!
>
> Best,
> Jennifer Lawlor
>
> --
>
> Jennifer Lawlor, MA
> Graduate Student, Ecological-Community Psychology
> Michigan State University
> E-mail: [log in to unmask]
>
> _____________________________________________________________________ SOCNET
> is a service of INSNA, the professional association for social network
> researchers (http://www.insna.org). To unsubscribe, send an email message to
> [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of
> the message.
> _____________________________________________________________________ SOCNET
> is a service of INSNA, the professional association for social network
> researchers (http://www.insna.org). To unsubscribe, send an email message to
> [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of
> the message.

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.