Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

                    Clairlib, The Clair Library

                    version 1.03 is now available

              http://belobog.si.umich.edu/clair/clairlib



INTRODUCTION

The University of Michigan's CLAIR (Computational Linguistics And
Information Retrieval) group is happy to present version 1.03 of 
clairlib, the Clair library. 

The Clair library is intended to simplify a number of generic tasks in
Natural Language Processing (NLP), Information Retrieval (IR), and
Network Analysis. Its architecture also allows for external software
to be plugged in with very little effort.

Two distributions of the Clair library are available: Clairlib-core,
with essential functionality and minimal dependence on external
software, and Clairlib-ext, with extended functionality that may be of
interest to a smaller audience.  Work is underway on Clairlib-bio and
Clairlib-polisci, extensions that will be of interest to people
working on Bioinformatics and Political Science.

FUNCTIONALITY

Native in Clairlib-core: Tokenization, Summarization, LexRank, Biased
LexRank, Document Clustering, Document Indexing, PageRank, Biased
PageRank, Web Graph Analysis, Network Generation, Power Law
Distribution Analysis, Network Analysis (clustering coefficient,
degree distribution plotting, average shortest path, diameter,
triangles, shortest path matrices, connected components), Cosine
Similarity, Random Walks on Graphs, Statistics (distributions, tests),
Tf, Idf, Community Finding*, Phrase-Based Queries*, Fuzzy OR Queries*

Imported functionality into Clairlib-core: Stemming, Sentence
Segmentation, Web Page Download, Web Crawling, XML Parsing, XML Tree
Building, XML Writing

Clairlib-ext features: Sentence Segmentation using MxTerminator,
Sentence Parsing using the Charniak Parser and Chunklink

* New and expanded functionality available this latest release


CHANGES

   1.03 August 2007
    * Added functionality to perform community finding within weighted,
    undirected networks
    * Added util/chunk_document.pl to break documents into smaller files
by
    word number
    * Added option to retain punctuation for idf and tf queries
    * Added option to print out full lists of idf and tf values for a
corpus
    * LexRank moved from Clair::Network to
    Clair::Network::Centrality::LexRank
    * LexRank use now follows the same use pattern as the other
centrality
    modules

   1.02 July 2007
    * Distribution reorganized in standard format
    * Improved and expanded installation documentation (INSTALL)
    * Improved POD (inline) documentation
    * Additional examples
    * Updated PDF documentation

   1.01 May 2007
    * Added Phrase-based Retrieval and Fuzzy OR Queries
    * Extended Clairlib-ext with interfaces for the Cluster class and
the
    Document class to the Weka machine learning toolkit
    * Added LSI functionality
    * Extended parsing of strings / files into Documents
    * Added perceptron learning and classification for documents

   1.0 RC1 April 2007
    * Moved all Clair modules beneath the Clair::* namespace, updated
    documentation
    * Improved Network Analysis, added Clustering Coefficients code
    * Added Network Generation and Statistics modules


DOWNLOAD

Visit http://belobog.si.umich.edu/clair/clairlib/ or write to
[log in to unmask] to get a copy.  Researchers doing work on
Bioinformatics or Political Science can write to [log in to unmask] to
receive beta versions of Clairlib-bio or Clairlib-polisci.


FUNDING

This work has been supported in part by National Institutes of Health 
grants R01 LM008106 "Representing and Acquiring Knowledge of Genome 
Regulation" and U54 DA021519 "National center for integrative 
bioinformatics", as well as by grants IDM 0329043 "Probabilistic and 
link-based Methods for Exploiting Very Large Textual Repositories," 
0534323 "Collaborative Research: BlogoCenter - Infrastructure 
for Collecting, Mining and Accessing Blogs," and 0527513 "The Dynamics
of 
Political Representation and Political Rhetoric," from the National 
Science Foundation.

ABOUT

The Clair Library is developed by the Clair group at the University of
Michigan.

Project design: Dragomir R. Radev

Main implementers: Jonathan dePeri, Anthony Fader, Joshua Gerrish,
Bryan Gibson, Mark Hodges, Mark Joseph, Dragomir Radev, and Mark
Schaller

Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss,
Gunes Erkan, Scott Gifford, Patrick Jordan, Samuela Pollack, and Adam
Winkel

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.