Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

I second Jurgen's suggestion.

A person with some experience with a scripting language (Python/Ruby
etc.) will  be able do exactly what you need. If you do not have
programming experience it would take you quite some time to learn
yourself. Existing tools have their own limitations too, so might not
do exactly what you want.

Best,
Michal
(frequent CS undergrad employer)


On Fri, Jun 3, 2016 at 2:04 AM, Gohar F. Khan <[log in to unmask]> wrote:
> ***** To join INSNA, visit http://www.insna.org ***** Also try Uberlink's
> hyperlinks analytics tool: http://www.uberlink.com/has
>
> It has a user friendly interface and can take multiple seed links.
>
> Thanks,
>
> On Friday, 3 June 2016, Moses Boudourides <[log in to unmask]>
> wrote:
>>
>> *****  To join INSNA, visit http://www.insna.org  *****
>>
>> I think the easiest way is through Ruby. The way to do it depends on
>> what you're looking for.
>>
>> If you're just interested in getting pages' content, the simplest way
>> is through the open-uri functions
>> (http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.html).
>>
>> If you want to parse content, there are several options using Ruby
>> gems, like the following:
>>
>> * Nokogiri, which is I guess the most popular
>> (http://railscasts.com/episodes/190-screen-scraping-with-nokogiri)
>> * Mechanize, which is built on top of Nokogiri
>> (http://railscasts.com/episodes/191-mechanize)
>> * Hpricot (https://rubygems.org/gems/hpricot)
>> * Or just  do a screen scraping with ScrAPI
>> (https://rubygems.org/gems/scrapi) and the ScrAPI RailsCast
>> (http://railscasts.com/episodes/173-screen-scraping-with-scrapi).
>>
>> --Moses
>>
>> On Thu, Jun 2, 2016 at 7:14 PM, Juergen Pfeffer <[log in to unmask]>
>> wrote:
>> > ***** To join INSNA, visit http://www.insna.org *****
>> >
>> > Not the answer you were hoping for, but instead of playing around with
>> > tools
>> > with limitations, I’d recommend finding a CS undergrad. This can be done
>> > in
>> > 1-2 hours with about 15 lines of Python code.
>> >
>> > Best,
>> >
>> > Jürgen (former CS undergrad)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: Jennifer Lawlor
>> > Sent: Thursday, June 2, 2016 5:59 PM
>> > To: [log in to unmask]
>> > Subject: [SOCNET] Web Crawler Recommendations
>> >
>> >
>> >
>> > ***** To join INSNA, visit http://www.insna.org *****
>> > Hi all,
>> >
>> > I'm working on a project involving hyperlink networks and I'm looking
>> > for
>> > some software tools. Can anyone recommend software for web crawling that
>> > takes multiple seed links and can output data in a universal format
>> > (e.g, a
>> > .csv)? I'm hoping to avoid writing the code for a crawler from scratch,
>> > so
>> > any advice you can offer about pre-existing software would be really
>> > helpful!
>> >
>> > Best,
>> > Jennifer Lawlor
>> >
>> > --
>> >
>> > Jennifer Lawlor, MA
>> > Graduate Student, Ecological-Community Psychology
>> > Michigan State University
>> > E-mail: [log in to unmask]
>> >
>> > _____________________________________________________________________
>> > SOCNET
>> > is a service of INSNA, the professional association for social network
>> > researchers (http://www.insna.org). To unsubscribe, send an email
>> > message to
>> > [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the
>> > body of
>> > the message.
>> > _____________________________________________________________________
>> > SOCNET
>> > is a service of INSNA, the professional association for social network
>> > researchers (http://www.insna.org). To unsubscribe, send an email
>> > message to
>> > [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the
>> > body of
>> > the message.
>>
>> _____________________________________________________________________
>> SOCNET is a service of INSNA, the professional association for social
>> network researchers (http://www.insna.org). To unsubscribe, send
>> an email message to [log in to unmask] containing the line
>> UNSUBSCRIBE SOCNET in the body of the message.
>
>
>
> --
>
> Gohar Feroz Khan, PhD
> Assistant Professor
> Department of Business Administration
> Keimyung University, Daegu, South Korea.
> Email: [log in to unmask]; Ph: 82-53-580-6371
>
> ----------
> Check out my new book on social media analytics!
> -----------
> Please consider submitting your work to the social media analytics track at
> PACIS2016.
> -----------
> Social Identities: || Blog || Twitter || LinkedIn || Research Centre||
>
>
> _____________________________________________________________________ SOCNET
> is a service of INSNA, the professional association for social network
> researchers (http://www.insna.org). To unsubscribe, send an email message to
> [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of
> the message.

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.