routines for co-word analysis further extended

Loet Leydesdorff <[log in to unmask]>

Loet Leydesdorff <[log in to unmask]>

Sun, 12 Oct 2008 17:19:32 +0200

text/plain

 text/plain (55 lines)
 ***** To join INSNA, visit http://www.insna.org ***** Dear colleagues: The routines ti.exe (at http://www.leydesdorff.net/software/ti/index.htm) and fulltext.exe (at http://www.leydesdorff.net/software/fulltext/index.htm) now additionally provide as output a file "words.dbf" (readable in Excel) which contains for all words the following summations: 1. A variable named "Chi_Sq" which provides Chi-square contributions for each of the variables (that is, words); these are defined for word(i) as Ó(i)÷2 = (Observed(ij) - Expected(ij))^2 / Expected(ij). In other words, the sum of the contributions over the column for the variable in each row (Mogoutov et al., 2008); 2. A variable named "ObsExp" which provides the sum of absolute values |Observed - Expected| for the word as a variable summed over the column; 3. A variable named "TfIdf" which use Salton & McGill's (1983: 63) TermFrequency-InverseDocumentFrequency measure (but without Salton's additional + 1; Magerman et al., 2007) defined as follows: WEIGHT(ik) = FREQ(ik) * [log2 (n) - log2 (DOCFREQ(k))]. This function assigns a high degree of importance to terms occurring in only a few documents in the collection; 4. The word frequency within the set. These statistics provide the researcher with opportunities to refine the list of words to be considered. References: Magerman, T., Van Looy, B., & Song, X. (2007). Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Paper presented at the 6th Triple Helix Conference, 16-19 May 2007, Singapore. Mogoutov, A., Cambrosio, A., Keating, P., & Mustar, P. (2008). Biomedical innovation at the laboratory, clinical and commercial interface: A new method for mapping research projects, publications and patents in the field of microarrays. Journal of Informetrics (In print); doi:10.1016/j.joi.2008.06.005.   ________________________________ Loet Leydesdorff Amsterdam School of Communications Research (ASCoR) Kloveniersburgwal 48, 1012 CX Amsterdam. Tel. +31-20-525 6598; fax: +31-842239111 [log in to unmask]

