*****  To join INSNA, visit  *****

The Triple-Helix Indicator and its Extension to Four Dimensions: 
The Measurement of Configurational Information in More than Two Dimensions
<at >

The program th4.exe reads an input file “data.txt” and generates (or adds to an existing) file th4.dbf containing probabilistic entropy values and mutual information values for three and/or four nominal variables. (The source code can be found here.) In a number of studies (see the reference list) we used the mutual information in three dimensions as Triple Helix indicator; for example, to measure the reduction of uncertainty (e.g., Yeung, 2008:59f.; cf. McGill, 1954) in the interactions between distributions in the geographical dimensions (addresses), organizational size, and technological capacities of firms (Lengyel & Leydesdorff, 2011; Leydesdorff et al., 2006; Leydesdorff & Fritsch, 2006; Leydesdorff & Strand, in press). 

Using publications as units of analysis, the focus can be on university, industry and/or government addresses in co-authorship relations (Kwon et al., 2012; Leydesdorff, 2003; Leydesdorff & Sun, 2009; Park et al., 2005; Ye et al., in preparation). A program for examining TH relations on a case-by-case basis is available at . (The program th.exe also computes also Krippendorff's (2009a) I(ABC→AB, AC, BC) and the redundancy R; T = I – R (Krippendorff, 2009b; Leydesdorff, 2009, 2010).

In a number of studies (and in the literature) questions have been raised about extending the Triple Helix to more than three helices (e.g., Carayannis & Campbell, 2009 and 2010; Leydesdorff, 2012). The issue is urgent since the dimension international versus national was found to be important as an additional dimension in a number of recent studies (Ye et al., in preparation). One may wish to appreciate international coauthorship as a fourth variable (Leydesdorff & Sun, 2009; Kwon et al., 2010) or “foreign driven investment” in the case of firm data (Lengyel & Leydesdorff, 2011; Strand & Leydesdorff, in press).

This routine (th4.exe) is meant to facilitate the computation of these values in the case of large sets. This version (unlike th.exe) operates on nominal values; for example, industry codes, the names of regions, classifications; the older routine th.exe uses numerical values. In the case of numerical values, one may wish to bin these or dichotomize. For example, if three addresses are provided of which two are from universities and one from industry, these U-I relation should be counted as “1”. In other words, numbers are read as character string by this (!) program.

Input file
Input file is a text file with one case (firm, publication, patent, etc.) on each line, and five variables. The first variable is a case-identifier; for example, “firm1” or “id0001”. The second to fifth variable are read as four nominal variables (including “0” and “1”). If the fifth variable is missing, all values are set to zero, and the corresponding dimension (“z”) is not computed. The four dimensions are indicated as w, x, y, and z, respectively. Each variable on the input file has to be embedded in double quotation marks, and the variables are delimited with commas. As follows:

     “id1”, “1”, “b”, “region1”, “2”
     “id2”, “2”, “a”, “region2”, “1”
     “id3”, “1”, “a”, “region2”, “2”
     “id4”, “1”, “b”, “region5”, “1”

For example, in the case of address information, the second variable may indicate the presence of a university address (Y/N), the third an industrial address, etc. In the case of firm data, the second variable may be a size category (e.g., zero for firms without employees to six for firms with more than 500 employees), the third variable a technology code (e.g., OECD’s NACE codes), the third an indication of the region, and the fifth whether the firm is domestically owned or a subsidiary of a foreign company.

The size of the file is not limited (but < 2 GByte). The input file should be named “data.txt”. Place no header with variable names at the first line (because these will be counted as separate categories). Note that typos may lead to the declaration of an additional class because the program indexes on the strings. The program and the input have to be placed in the same folder.

The program generates the file th4.dbf if not present in this folder; or if present, a new record is appended to th4.dbf. This file can be read using Excel or a similar program. As said, the variables are denoted “w”, “x”, “y”, and “z”, and the new record contains the uncertainties in these four dimensions (Hw, Hx, Hy, Hz), the joint entropies (such Hwx, Hwxy, Hwxyz, etc.), and all possible transmissions (Twx, Twxy, Twxyz, etc.) among them.

The current version is very much a beta-version. Please, provide feedback for further improvements if bugs are encountered. Carefully check the output on errors! 

I acknowledge Balazs Lengyel for helping to develop this routine.
** apologies for cross-postings

Loet Leydesdorff 
Amsterdam School of Communications Research (ASCoR)
Kloveniersburgwal 48, 1012 CX Amsterdam.
Tel. +31-20-525 6598; fax: +31-842239111
[log in to unmask] ; 

SOCNET is a service of INSNA, the professional association for social
network researchers ( To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.