***** To join INSNA, visit http://www.sfu.ca/~insna/ ***** I have compiled the list of suggestions I received regarding my inquiry about using logistic regression to analyze predictors of whether a link exists between two individuals in a network. I first reprint my original inquiry below, which is followed by the responses. Thanks to everyone for their replies. Alan Reifman ------------------------------------------------------------ Original Inquiry: I am working on a research project that has the following (hypothetical) design. Say that we have 10 individuals in some community and we want to predict whether a friendship exists between each pair of individuals. A matrix of non-directional friendship links (1 = yes, 0 = no) can be formed with 45 elements (i.e., a lower-diagonal matrix with no self ties). I'm a relative newcomer to quantitative network studies, but the seemingly simple analytic design I came up with was to create a data set with 45 lines of data (one for each potential pairing). The dependent variable on each line would be the aforementioned 1 or 0 for existence of a friendship. Each line would also contain several predictor variables for the potential pair, some dichotomous (e.g., do they work for the same firm?, again scored 1 or 0) and some quantitative (e.g., how many blocks apart do they live?). One could then perform a logistic regression with the dichotomous DV and the various predictor variables. An odds ratio associated with each predictor would reveal whether the predictor appeared to contribute to pairs' being friends with each other. I just finished reading Wasserman and Faust's "Social Network Analysis" (in toto) and I did not find any analytic strategy like the one I described above (as best I could tell). Wasserman and Faust focused more on blockmodels, popularity and reciprocity parameters, etc. I do recognize that the design I've proposed has a potential problem with non-independence of observations (i.e., the same individual is implicated in several potential pairings). My question is twofold: (a) putting aside the independence issue for the moment, does my design sound reasonable? and (b) could the independence problem be overcome (at least to some extent) by using alpha levels more stringent than the usual .05 in order to adjust for the (presumably) inflated degrees of freedom in my design? Thanks, Alan Reifman ------------------------------------------------------------ REPLIES ------------------------------------------------------------ Your design sounds reasonable although purebred statisticians would be infuriated at my saying this. Good news is that you can take care of the second and third order correlations using the fixed effects or bilinear mixed effects approaches. See e.g. Peter Hoff's work and bibliography therein: Hoff, P.D. "Bilinear Mixed-Effects Models for Dyadic Data" http://www.stat.washington.edu/hoff/Preprints/dyadic.pdf Gueorgi Kossinets ------------------------------------------------------------ Look at: http://kentucky.psych.uiuc.edu/pstar/ Miller McPherson ------------------------------------------------------------ This is similar to an approach we have used and validated what you are doing is consstructing a similarity index it works better if you do it in terms of relative similarity so you have a matrix A that is people by attributes such that each cell is a 1 if that person as that attribute then the sum across k of (Aik*Ajk)/ sum across k (Aik) where i and j are people and k the attributes see also the papers on construct - where we used this but interpreted the attributes as knowledge Kathleen Carley, 1990. "Group Stability: A Socio-Cognitive Approach." in Advancesw in Group Processes, eds. E. Lawler, B. Markovsky, C. Ridgeway & H. Walker, Greenwich, CT: JAI Press, Vol. 7, pp. 1-44. Kathleen Carley ------------------------------------------------------------ probably someone told you already to have a look at p*-models (Andersen, Wasserman and Crouch 1999, Wasserman und Pattison 1996, based on ideas of Frank and Strauss 1986 and Holland and Leinhardt 1981). But no one posted an answer to the list yet! These are logit regression models for social networks. You predict the probability for the existence of a tie between two actors using actors attributes as well as indices of social network analysis (e.g. actors' degrees, transitivity, density). In contrast to usual logit models you can model ties as not independent: A good model for the prediction of a link from i to j considers for example, if there is a tie from j to i and how large the tendency towards symmetry is in the network. (As you have undirected ties only, you might consider a similar procedure regarding transitivity!). A good article that takes into account different levels of network data (in the sense of multilevel modelling: e.g. density is a variable on the level of the whole network, degree an actor level variable) is Hummel and Sodeur (1997). Anderson, C.J., Wasserman, S. & Brouch, B. (1999). A p* primer: Logit models for social networks. Social Networks, 21, 37-66. Frank, O & Strauss, D. (1986). Markov Graphs. Journal of the American Statistical Association, 81, 832-842. Holland, P.W. & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33-65. Hummell, H.J. & Sodeur, W. (1997). Structural analysis of social networks with respect to different levels of aggregation. Mathématiques, Informatique et Sciences Humaines, 35, 37-60. Wasserman, S. und Pattison, P. (1996) Logit models and logistic regressions for social networks: I. An introduction to Markov random graphs and p*. Psychometrika, 60, 401-426. Dr. Mark Trappmann ------------------------------------------------------------ Interesting that you arived at a logistic model for social network independently! However, the p1 model described in Wasserman and Faust (p. 614) can also be expressed as a logistic regression model because of the general relationship between log-linear models and logistic regression models (the former fits cell frequencies the latter fits a binary outcome), so this general approach has been around for awhile and there is a fair amount of work in this area. Scan the social network literature for mention of the p* and p2 framework. Sam Field ------------------------------------------------------------ Your study design is fine and quite a few studies like this have been done in the past. The issue of non-independence of observations is not the only one you would have to struggle with when conducting this type of analysis. While there are many methods available, try the simplest approach commonly referred to as QAP (Quadratic Assignment Procedure). You can find this option in UCINET. You may want to refer to the following papers for a detailed explanations of this technique: Krackhardt, D. 1988. Predicting with networks: Nonparamatric multiple regression analysis of dyadic data. Social Networks, 10: 359-381. Gulati, R. 1995. Social structure and alliance formation patterns: A longitudinal analysis. Administrative Science Quarterly, 40: 619-652. Andrew V. Shipilov ------------------------------------------------------------ I began doing dyadic analyses of the kind you mentioned backed in the mid-1980s. Gulati, among others, drew on these analyses. One representative publication is Mizruchi, Mark S. 1989. "Similarity of political behavior among large American corporations." American Journal of Sociology 95:401-424. That paper uses a fixed effects approach to handle the non-independence problem. Subsequently, on Krackhardt's suggestion, I used quadratic assignment. For analyses using that, as well as detailed discussions about the entire issue of dyadic network analyses, see Mizruchi, Mark S. 1992. The Structure of Corporate Political Action: Interfirm Relations and Their Consequences. Cambridge, MA: Harvard University Press. Mark S. Mizruchi ------------------------------------------------------------ You may have got this possible answer to your question already, but I think the p2 model (an extension of the p1 model described in Wasserman & Faust) would suit your needs. It is a kind of - specific - logistic regression model that deals with dependence between relations from and to the same individuals. The model is implemented in StOCNET, which is free software. For now, I refer you to StOCNET's website, where you can find more documentation on the software and on the p2 model (among other models). http://stat.gamma.rug.nl/stocnet Marijtje van Duijn _____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.