***** To join INSNA, visit http://www.sfu.ca/~insna/ ***** My first day on this list, but I think this is an appropriate forum to announce a website I first built three years ago, using Amazon's affinity info on customers' buying patterns. In early 2000 I gathered info on 3000 musical artists; a year later I picked up about 5000 artists, with about 20,000 links. Then Amazon changed their format slightly, a sys-admin rearranged files on my machine, I was busy with work, and I left the site alone, languishing with data that got more out-dated every week. Then last summer Amazon announced their Web Services program AWS, which made their data available directly without having to parse HTML. So I brought my server back to life, wrote a couple of programs, and now have a database of about 440,000 items and 3.6 million links. The items currently cover books, CDs, and videos. The site is http://www.baconizer.com/ -- Jon Udell wrote it up a month ago in his infoworld.com blog, along with a paragraph on Valdis Krebs' work. I haven't found any business applications with my site; it's basically a cool diversion. I'm reading up on clustering algorithms, but haven't implemented anything yet. But using the site definitely shows clusters. A query like http://www.baconizer.com/cgi-bin/boston?title1=6300988678&title2=0380809087 will make this obvious. The Baconizer draws shortest paths between any two nodes in its database. In April's 400,000 nodes, 380,000 were in one component, which meant that you could reach any two of them. Given two nodes A and B in two different components, there might be a path from A to B, or from B to A, but not both -- then they would be in the same component. I haven't calculated the May data's components, but I've noticed that items like Buffy videos that were in a remote component are now in the main one. Computationally, finding a shortest path is easy -- it's about 50 lines of C code. The clustering is harder, but I can run a process for days, if that's what it takes. The next part is labeling the clusters. I should look to see if Amazon is returning genre information now. I wrote the gather against version 1 of the AWS, and they're on version 3 now. I update the data on a monthly basis -- one of the terms of using the Amazon web service is that a developer shouldn't make more than one query per second, which means pulling down 400,000 items will take about five days, allowing for network timeouts/retries. It turns out to be much more than that, as I don't throttle their web service at one request/second, as the live web site uses it as well. The numerical data is interesting. For one thing, the links are directed, and their distribution follows a power law, with a factor of around -2 if I recall correctly. I seed the graph with albums by the Bacon Brothers, and every month reach about 400,000 items. This time I merged the May data with the April -- 40,000 nodes from April weren't found, but 40,000 new ones were. This fits with an aging model for the sort of items Amazon sells. I just merge the data together to make the site richer, but keep the original data sets intact. The average shortest distance is around 11, with the longest path at 53 for May. The nodes at the center tend to be history textbooks, blues albums, and mainstream best-sellers. At the periphery I find genres, like romance, pre-teen girls' series like the Babysitter's Club, TV series videos, books on comic books and Japanese anime and manga characters, live CDs from prolific bands like Phish and Pearl Jam, and various kinds of Bob Marley remixes. It gives an interesting view of North American pop culture. I'm wondering if anyone here is interested in pursuing this as a research tool. Also, unlike many of the other networks I've seen, this one is built based on what people are doing with their hard-earned cash right now. I find the patterns that emerge from that kind of data more interesting than which actors worked with which others, which are drier facts. That didn't stop me from naming the site after the Oracle of Bacon. I'm not a researcher -- I work for a small software company building development tools that let programmers use open-source languages in the new Windows .NET environment. About the only application of the Baconizer for my work is that I now can spot a cyclic graph from miles away. Eric Promislow [log in to unmask]  http://www.amazon.com/gp/aws/landing.html _____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.