***** To join INSNA, visit http://www.insna.org *****
Thank you all for your suggestions! I have checked them out - I'll take
a closer look at nutch and see whether I can adapt it to suit my needs,
but I'm also keen on developing my own crawler client in C#. I'll most
likely use a couple of blog portals to identify the most popular and
active blogs first, and then scan these for linkages between them.
Although it should be possible to let it 'free-crawl', in combination
with language identification functions and IP range restrictions, I
guess that nevertheless would result in a was too large dataset...
Ben Spigel wrote:
> ***** To join INSNA, visit http://www.insna.org *****
>
> Issuecrawler can export to a UCINET format. I found it a useful tool
> for small-scale network analysis, but if you're looking to crawl more
> than, say, 200 webpages I think you'd be better off with software that
> you can run yourself on your own machine. That way you can customize
> it to your needs, and you're not taking over a small non-profit's
> bandwidth.
>
> Ben Spigel
> PhD Student
> Department of Geography
> University of Toronto
>
> On Fri, Jul 10, 2009 at 10:36 AM, Alvin Chin<[log in to unmask]> wrote:
>
>> ***** To join INSNA, visit http://www.insna.org *****
>>
>> As far as I know, when I started to use IssueCrawler, I was able to
>> extract the social graph.
>>
>> Alvin
>>
>>
>> On Fri, Jul 10, 2009 at 9:04 PM, Brian Ulicny<[log in to unmask]> wrote:
>>
>>> ***** To join INSNA, visit http://www.insna.org *****
>>>
>>> We've used Nutch in our work in analyzing the Malaysian blogosphere at
>>> VIStology. Nutch is an open-source, customizable web crawler in Java.
>>> See Nutch.org.
>>>
>>> Nutch works out of the box to crawl websites, but you'll have to do
>>> some (fairly easy) customization to extract the link structure.
>>>
>>> You might want to look at issuecrawler.net for a hosted crawling
>>> service. I'm not sure if the link structure is available as output.
>>>
>>> Best,
>>>
>>> Brian Ulicny
>>> VIStology, Inc.
>>> Framingham, MA
>>> USA
>>>
>>> On 7/10/09, Lukas Zenk <[log in to unmask]> wrote:
>>>
>>>> ***** To join INSNA, visit http://www.insna.org *****
>>>>
>>>> Hi Carl,
>>>>
>>>> if you'd like to crawl e.g. google blogs, you could use the software
>>>> Condor:
>>>> http://www.galaxyadvisors.com/documents/condor.pdf
>>>>
>>>> Regards,
>>>> Lukas
>>>>
>>>> ---
>>>> Lukas Zenk, PhD.cand.
>>>> Member of the scientific staff
>>>> Department of Knowledge and Communication Management
>>>> Danube University Krems - Austria / Europe
>>>> www.donau-uni.ac.at
>>>>
>>>> On Jul 10, 2009, at 2:56 AM, Carl Nordlund wrote:
>>>>
>>>>
>>>>> ***** To join INSNA, visit http://www.insna.org *****
>>>>>
>>>>> Hi!
>>>>> Inspired by the amazing work done by the ppl at Berkman center at
>>>>> Harvard
>>>>> (http://cyber.law.harvard.edu/publications/2008/Mapping_Irans_Online_Public/interactive_blogosphere_map
>>>>>
>>>>> ), I've been thinking about how to gather bloggosphere data, i.e.
>>>>> the creation of a (national) network dataset in which each node is a
>>>>> blog and where the edges/links are the number of directional links
>>>>> (external) from-to each pair of blogs. I have started working on a
>>>>> php script that recursively crawls a website, check for external
>>>>> links, and builds a dataset - this of course has to be combined with
>>>>> a check on the nationality of the blog (comparing with national IP
>>>>> ranges and/or language analysis of a sample text).
>>>>>
>>>>> But perhaps I'm trying to invent the wheel again. Are there any
>>>>> suitable web crawling software that can do the trick? As I have
>>>>> understood it, the consulting firm Morningside Analytics helped the
>>>>> Berkman group in their mapping - judging by the rather large
>>>>> dataset, I assume that they used some sort of web crawler. Anyone
>>>>> knows anything more about this?
>>>>>
>>>>> Yours,
>>>>> Carl Nordlund
>>>>> ---
>>>>> Carl Nordlund, BA, PhD student
>>>>> carl.nordlund(at)hek.lu.se
>>>>> Human Ecology Division, Lund university
>>>>> www.hek.lu.se
>>>>>
>>>>> _____________________________________________________________________
>>>>> SOCNET is a service of INSNA, the professional association for social
>>>>> network researchers (http://www.insna.org). To unsubscribe, send
>>>>> an email message to [log in to unmask] containing the line
>>>>> UNSUBSCRIBE SOCNET in the body of the message.
>>>>>
>>>> _____________________________________________________________________
>>>> SOCNET is a service of INSNA, the professional association for social
>>>> network researchers (http://www.insna.org). To unsubscribe, send
>>>> an email message to [log in to unmask] containing the line
>>>> UNSUBSCRIBE SOCNET in the body of the message.
>>>>
>>>>
>>> _____________________________________________________________________
>>> SOCNET is a service of INSNA, the professional association for social
>>> network researchers (http://www.insna.org). To unsubscribe, send
>>> an email message to [log in to unmask] containing the line
>>> UNSUBSCRIBE SOCNET in the body of the message.
>>>
>>>
>> _____________________________________________________________________
>> SOCNET is a service of INSNA, the professional association for social
>> network researchers (http://www.insna.org). To unsubscribe, send
>> an email message to [log in to unmask] containing the line
>> UNSUBSCRIBE SOCNET in the body of the message.
>>
>>
>
> _____________________________________________________________________
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.insna.org). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
>
--
Carl Nordlund, BA, PhD student
carl.nordlund(at)hek.lu.se
Human Ecology Division, Lund university
www.hek.lu.se
_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.
|