LISTSERV mailing list manager LISTSERV 16.0

Help for R-USERS-L Archives


R-USERS-L Archives

R-USERS-L Archives


R-USERS-L@LISTS.UFL.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

R-USERS-L Home

R-USERS-L Home

R-USERS-L  2018

R-USERS-L 2018

Subject:

Re: lm() with NA's

From:

Ben Bolker <[log in to unmask]>

Reply-To:

UF R Users List <[log in to unmask]>

Date:

Sun, 2 Dec 2018 20:30:08 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (837 lines)

  At the risk of getting into the statistical weeds here:

I definitely agree with the advice about excluding the mostly-NA
variables.  I definitely *disagree* with the whole VIF/exclude-collinear
terms approach promoted by Zuur and others.  It *may* be OK if your
*only* goal is to find the best predictive model (Dormann et al 2012),
but if you care at all about valid inference (p-values, confidence
intervals, etc etc.) then it doesn't work.  I agree with Morrissey and
Ruxton (2018)'s take on this:

> There is no general sense in which collinearity is a problem. We
suspect that the perception of collinearity as a hindrance to analysis
stems from misconceptions about interpretation of multiple regression
models, and so we pursue discussions about these misconceptions in this
light. In particular, collinearity causes multiple regression
coefficients to be less precisely estimated than corresponding simple
regression coefficients. This should not be interpreted as a problem, as
it is perfectly natural that direct effects should be harder to
characterise than univariate associations. Purported solutions to the
perceived problems of collinearity are detrimental to most biological
analyses.

Graham's paper is also good IMO.

===
Morrissey, Michael B., and Graeme D. Ruxton. “Multiple Regression Is Not
Multiple Regressions: The Meaning of Multiple Regression and the
Non-Problem of Collinearity.” Philosophy, Theory, and Practice in
Biology 10, no. 3 (2018).

Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun
Carl, Gabriel Carré, Jaime R. García Marquéz, et al. “Collinearity: A
Review of Methods to Deal with It and a Simulation Study Evaluating
Their Performance.” Ecography, 2012, no–no.
https://urldefense.proofpoint.com/v2/url?u=https-3A__doi.org_10.1111_j.1600-2D0587.2012.07348.x&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=7gEOM6b8fm83cLFdkgrs4VIyYOpVLuiCCzJGUaI5vko&s=d-mndqpaFTNM0QucbQ_T-japt4FrlUCQozFu_sWe-Xk&e=.

Graham Michael H. “Confronting Multicollinearity in Ecological Multiple
Regression.” Ecology 84, no. 11 (November 1, 2003): 2809–15.
https://urldefense.proofpoint.com/v2/url?u=https-3A__doi.org_10.1890_02-2D3114&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=7gEOM6b8fm83cLFdkgrs4VIyYOpVLuiCCzJGUaI5vko&s=UtyWncW_hbyYA7ybhAZmPeMTvANbBlTS6O9XIkxza94&e=.


On 2018-12-02 8:23 p.m., Klarenberg,Geraldine wrote:
> From a methodological point of view, I would recommend checking for
> collinearity/correlation of (predictor) variables before doing a
> regression anyway. Just looking at the names of the variables, I’m
> pretty sure some of them have strong correlations. Strong
> correlation/collinearity confound your results and make it difficult to
> interpret your coefficient values.
> I usually run a Variance Inflation Factor (VIF) analysis on my variables
> before doing further analyses
> (see https://urldefense.proofpoint.com/v2/url?u=https-3A__onlinecourses.science.psu.edu_stat501_node_347_&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=7gEOM6b8fm83cLFdkgrs4VIyYOpVLuiCCzJGUaI5vko&s=BJ8iCETjiCUZ-QCR5bCGfTtLyH9u54CCQgWYa29jVoQ&e=
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__onlinecourses.science.psu.edu_stat501_node_347_&d=DwMGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=3MJaagrtDUL79e1ACXZji8lYXjs07_ujkEIg1it6gQ0&m=UdDEcPchFvytx1N1KYnKj66_YoN_u2m8LyaqlT3DkcY&s=r09XwVvbOXTYc6VLj8IpfqXuzVKW5sXWgXv8SAkCfWw&e=>).
> 
> Also, from the summary you sent, it seems TON_2, FECCOL_CFU_2 and
>  ECOLI_MPN_2 are largely NAs? Excluding them might help in having enough
> data left.
> 
> Geraldine Klarenberg, PhD
> Post-Doctoral Associate
> Department of Wildlife Ecology and Conservation / Agricultural and
> Biological Engineering
> University of Florida
> Tel: 352-294-7581
> Cell: 386-517-3952
> Email: [log in to unmask] <mailto:[log in to unmask]>
> 
> 
> 
> 
>> On Dec 2, 2018, at 7:33 PM, Ben Bolker <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>>  Yeah.  Unfortunately this is just not an easy problem.  Random
>> forests can handle this kind of prediction with missing values
>> relatively simply; for most other methods including regression
>> methods, some kind of imputation is usually necessary, and as far as I
>> am aware (or can tell by brief  googling) these methods aren't built
>> into regression tools; they have to be run/dealt with at a separate
>> stage of the analysis.  I would strongly recommend Harrell's book - he
>> has a lot of useful advice.
>> On Sun, Dec 2, 2018 at 7:24 PM Kyzar,Tricia E <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>>
>>> Thanks, Ben!
>>> I really hate to limit this dataset. One of the reasons I took this
>>> class was a hope to deal with this type of data. But, at least for
>>> the project,  I'm thinking of how I can quickly trim things down.  *sigh*
>>>
>>> Thanks for your help and the resources!
>>>
>>> Sent from my Samsung Galaxy smartphone.
>>>
>>>
>>> -------- Original message --------
>>> From: Ben Bolker <[log in to unmask] <mailto:[log in to unmask]>>
>>> Date: 12/2/18 7:08 PM (GMT-05:00)
>>> To: [log in to unmask] <mailto:[log in to unmask]>
>>> Subject: Re: lm() with NA's
>>>
>>> Using a data set with NAs in it is definitely beyond basic statistics.
>>> This requires *some* kind of _imputation_ (see e.g
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__thomasleeper.com_Rcourse_Tutorials_mi.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=mgusBM4CrBJxczq8XqnZ4i2pAW6GAilwAUCSA2B-bYA&e=
>>> ). If you don't
>>> want to bother, you need to decide based on the importance of your
>>> predictor variables and on the number and pattern of missingness which
>>> ones to drop.  Harrell's _Regression Modeling Strategies_ has a lot of
>>> practical details about this, and the Hmisc (or maybe rms) package has
>>> tools for viewing the patterns of missingness to help with these
>>> decisions: so does e.g. the Amelia package (missmap()).
>>> On Sun, Dec 2, 2018 at 6:14 PM Kyzar,Tricia E <[log in to unmask]
>>> <mailto:[log in to unmask]>> wrote:
>>>>
>>>> LOL  well, that presents a problem.  I ran the na.omit on my table
>>>> and it left me with 0 rows- which is what I expected.
>>>>
>>>> Is it just not possible to use a dataset that has NA’s in it?
>>>>
>>>>
>>>>
>>>> Tricia Kyzar
>>>>
>>>> Ph: 352-392-7260
>>>>
>>>> Email: [log in to unmask] <mailto:[log in to unmask]>
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of Toh,Kok Ben
>>>> Sent: Sunday, December 2, 2018 6:08 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> lm automatically removes any observation that contains NA (more
>>>> specifically, if one of the covariate in the model has NA, it
>>>> removes the entire row). So it may be that although you have four
>>>> levels (of station) in your dataset, after omitting NAs, you only
>>>> have one level.
>>>>
>>>>
>>>>
>>>> Following example demonstrate this problem, only observation in
>>>> station C has non-NA values for x:
>>>>
>>>>
>>>>
>>>>> df <- data.frame(station = rep(c("A", "B", "C"), each = 10), x =
>>>>> c(rep(NA, 20), 1:10), y = c(10:29, 2:11))
>>>>
>>>>> str(df)
>>>>
>>>> 'data.frame':  30 obs. of  3 variables:
>>>>
>>>> $ station: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
>>>>
>>>> $ x      : int  NA NA NA NA NA NA NA NA NA NA ...
>>>>
>>>> $ y      : int  10 11 12 13 14 15 16 17 18 19 ...
>>>>
>>>>> lm(y ~ ., df)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>> One way to check this is using the “na.omit” function, the function
>>>> removes any row that contains even a single NA, which is what lm
>>>> kind of does here:
>>>>
>>>>> df2 <- na.omit(df)
>>>>
>>>>> table(df2$station)
>>>>
>>>>
>>>>
>>>> A  B  C
>>>>
>>>> 0  0 10
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of
>>>> Kyzar,Tricia E
>>>> Sent: Sunday, December 2, 2018 5:44 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> Hey Ben!
>>>>
>>>>
>>>>
>>>> I really appreciate your response – I had been wondering about
>>>> taking smy out and converting station to a factor (with 4 levels).
>>>>  But before I continue I want to be sure to point out that this is
>>>> for our SML class project, so while I am allowed to ask for help
>>>> with my data issues, we can definitely not talk about the analysis
>>>> process itself.
>>>>
>>>>
>>>>
>>>> That being said, I reran the function without smy and station
>>>> (station has been replaced with stationID – again, a factor with 4
>>>> levels).
>>>>
>>>>> lm.fit=lm(TN_2~., Subs_TN, na.rm=TRUE)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>> My dependent variable, TN_2 is definitely numeric:
>>>>
>>>>> summary(Subs_TN$TN_2)
>>>>
>>>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>
>>>> 0.0885  0.3294  0.4660  0.5067  0.6243  1.8332
>>>>
>>>>
>>>>
>>>> How can I look for the problem variable?
>>>>
>>>> I ran the str() function on the updated dataset and don’t see any
>>>> other factors other than the one I made and it shows as having 4 levels:
>>>>
>>>>> str(Subs_TN)
>>>>
>>>> 'data.frame':      578 obs. of  68 variables:
>>>>
>>>> $ Temp_2       : num  13.3 15 16 16.3 17.2 ...
>>>>
>>>> $ SpCond_2     : num  49 53.4 50.4 50.3 54.2 ...
>>>>
>>>> $ Sal_2        : num  32 35.3 33.1 33 35.9 ...
>>>>
>>>> $ DO_mgl_2     : num  7.96 7.57 7.49 7.88 7.84 ...
>>>>
>>>> $ Depth_2      : num  2.49 2.58 2.71 2.53 2.69 ...
>>>>
>>>> $ pH_2         : num  8.23 8.24 8.22 8.41 8.18 ...
>>>>
>>>> $ Turb_2       : num  6.18 2.27 10.77 4.25 13.87 ...
>>>>
>>>> $ DO_Pct_2     : num  92.6 93.1 92.4 98.2 100.8 ...
>>>>
>>>> $ PO4F_2       : num  0.0065 0.0165 0.0035 0.0085 0.015 0.0105 0.012
>>>> 0.0075 0.011 0.0155 ...
>>>>
>>>> $ TDP_2        : num  0.0235 0.0145 0.0135 0.025 0.033 0.021 NA NA
>>>> NA NA ...
>>>>
>>>> $ TP_2         : num  0.0345 0.028 0.051 0.032 0.039 0.029 0.069
>>>> 0.015 NA 0.0325 ...
>>>>
>>>> $ PHOSP_2      : num  0.011 0.0135 0.0375 0.007 0.006 0.008 NA NA NA
>>>> NA ...
>>>>
>>>> $ NH4F_2       : num  0.0205 0.0515 0.0475 0.064 0.035 0.024 0.025
>>>> NA 0.0255 0.0155 ...
>>>>
>>>> $ NO2F_2       : num  0.0014 0.00175 0.0039 0.0041 NA NA NA NA
>>>> 0.00195 NA ...
>>>>
>>>> $ NO3F_2       : num  NA 0.0045 0.0085 0.0014 NA NA NA NA 0.00955
>>>> 0.003 ...
>>>>
>>>> $ NO23F_2      : num  0.0014 0.00625 0.0124 0.0055 0.01245 ...
>>>>
>>>> $ DIN_2        : num  NA 0.058 0.06 0.0695 0.047 0.034 0.032 NA
>>>> 0.037 NA ...
>>>>
>>>> $ TDN_2        : num  0.181 0.152 0.137 0.329 0.244 ...
>>>>
>>>> $ TN_2         : num  0.281 0.203 0.186 0.354 0.274 ...
>>>>
>>>> $ PN_2         : num  0.1003 0.0506 0.0495 0.0253 0.0296 ...
>>>>
>>>> $ UncCHLa_N_2  : num  1.9 1.1 2.1 0.7 1.8 1.9 5.3 1.9 3.3 2.15 ...
>>>>
>>>> $ CHLA_N_2     : num  0.25 1.05 NA 0.8 1.55 1.6 4.8 1.8 2.8 1.75 ...
>>>>
>>>> $ PHEA_2       : num  1.05 0.155 NA 0.55 0.5 0.7 1.05 0.3 0.95 0.565 ...
>>>>
>>>> $ POC_2        : num  0.462 0.925 2.255 0.915 0.455 ...
>>>>
>>>> $ SiO4F_2      : num  0.378 1.891 1.347 0.448 1.092 ...
>>>>
>>>> $ TSS_2        : num  36 21.5 36 25 83.7 ...
>>>>
>>>> $ WTEM_N_2     : num  13.4 14.3 19.4 15.3 18.7 16.7 10.6 13.8 17.6
>>>> 13.7 ...
>>>>
>>>> $ SALT_N_2     : num  31.2 33.5 31.7 30.9 32.3 ...
>>>>
>>>> $ DO_N_2       : num  8.8 9.8 7 8.2 7 7.8 5.9 10.4 7.2 10.5 ...
>>>>
>>>> $ PH_N_2       : num  8 8.2 7.9 8.1 8 7.8 NA 6.5 NA 8.1 ...
>>>>
>>>> $ TURB_N_2     : num  2.9 2.75 3.75 3 5.1 3.55 0.8 1.95 7.4 NA ...
>>>>
>>>> $ SECCHI_2     : num  1.6 1.9 1.3 1.5 1.5 1.8 1.5 2 1.8 NA ...
>>>>
>>>> $ IRR0_N_2     : num  251 125 1212 684 762 ...
>>>>
>>>> $ IRR1_N_2     : num  26.8 26.4 366.2 183.8 236.6 ...
>>>>
>>>> $ Kd_N_2       : num  2.24 1.55 1.2 1.31 525.2 ...
>>>>
>>>> $ COLOR_2      : num  NA 4 4.5 NA 6.75 ...
>>>>
>>>> $ TON_2        : num  NA NA NA NA NA NA NA NA 0.474 NA ...
>>>>
>>>> $ FECCOL_CFU_2 : num  NA NA NA NA NA NA NA NA NA 7 ...
>>>>
>>>> $ ECOLI_MPN_2  : num  NA NA NA NA NA NA NA NA NA NA ...
>>>>
>>>> $ month        : num  1 1 1 1 1 1 1 1 1 1 ...
>>>>
>>>> $ year         : num  2003 2004 2005 2006 2007 ...
>>>>
>>>> $ stationID    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1
>>>> 1 1 ...
>>>>
>>>> $ Prct_1000    : num  18.4 18.4 18.4 18.4 18.4 ...
>>>>
>>>> $ Prct_3000    : num  6.07 6.07 6.07 6.07 6.07 ...
>>>>
>>>> $ Prct_4000    : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Prct_5000    : num  19.1 19.1 19.1 19.1 19.1 ...
>>>>
>>>> $ Prct_6000    : num  56.4 56.4 56.4 56.4 56.4 ...
>>>>
>>>> $ Prct_7000    : num  0.207 0.207 0.207 0.207 0.207 ...
>>>>
>>>> $ Prct_8000    : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Prct_9000    : num  11.9 11.9 11.9 11.9 11.9 ...
>>>>
>>>> $ PoorlyDrained: num  2e+06 2e+06 2e+06 2e+06 2e+06 ...
>>>>
>>>> $ WellDrained  : num  64587 64587 64587 64587 64587 ...
>>>>
>>>> $ KnownSewer   : int  50 50 50 50 50 50 50 50 50 50 ...
>>>>
>>>> $ KnownSeptic  : int  4 4 4 4 4 4 4 4 4 4 ...
>>>>
>>>> $ LikelySeptic : int  25 25 25 25 25 25 25 25 25 25 ...
>>>>
>>>> $ Residential  : int  141 141 141 141 141 141 141 141 141 141 ...
>>>>
>>>> $ Commercial   : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Industrial   : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Institutional: int  14 14 14 14 14 14 14 14 14 14 ...
>>>>
>>>> $ UtilTrans    : int  1 1 1 1 1 1 1 1 1 1 ...
>>>>
>>>> $ ParksRec     : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Res_SqM      : num  450947 450947 450947 450947 450947 ...
>>>>
>>>> $ Comm_SqM     : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Indust_SqM   : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Instit_SqM   : num  1294539 1294539 1294539 1294539 1294539 ...
>>>>
>>>> $ UtilTrans_SqM: num  103 103 103 103 103 ...
>>>>
>>>> $ Waters_SqM   : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ ParksRec_SqM : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>>
>>>>
>>>> Tricia Kyzar
>>>>
>>>> Ph: 352-392-7260
>>>>
>>>> Email: [log in to unmask]
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of Toh,Kok Ben
>>>> Sent: Sunday, December 2, 2018 4:23 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> Hi Tricia,
>>>>
>>>>
>>>>
>>>> The problem is with “station”. I believe there’s only one level in
>>>> your station variable, i.e. “gtmfm”, and R is wondering why you want
>>>> to fit a variable with one level in the linear model.
>>>>
>>>>
>>>>
>>>> Following code gives the similar error:
>>>>
>>>>> df <- data.frame(station = rep("gtmfm", 10), x = 1:10, y = 2:11)
>>>>
>>>>> lm(y ~ ., data=df)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>> When you use the formula like this “y ~ .”, you basically want to
>>>> regress y with all other columns in your data frame, that includes
>>>> the station column.
>>>>
>>>>
>>>>
>>>> Moreover, the ‘smy’ variable is definitely a problem. There are 766
>>>> original levels in this variable and you have only 720 observation
>>>> in your data, by running smy as a covariate, you are saying that you
>>>> want to fit a value to each level of smy. So if there are 720
>>>> observations with 720 unique levels of smy in your dataset, and
>>>> together with all other covariates, you would be fitting a model of
>>>> ~790 covariates with only 720 observations.
>>>>
>>>>
>>>>
>>>> Following is a dummy example:
>>>>
>>>> df <- data.frame(station = letters, x = 1:26, y = 2:27)
>>>>
>>>> mod <- lm(y ~ ., data=df)
>>>>
>>>> summary(mod)
>>>>
>>>>
>>>>
>>>> You’ll see the summary(mod) gives you weird results and complain
>>>> about not having enough residual degree of freedom.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of
>>>> Kyzar,Tricia E
>>>> Sent: Sunday, December 2, 2018 2:27 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> Thank you for clarifying the na.rm question – I’m adding this to my
>>>> notes for future reference.
>>>>
>>>> I have looked at the structure of my table as you suggest.  I have
>>>> one factor variable with 766 (original) levels.  Is that ‘too’ many
>>>> levels?
>>>>
>>>>
>>>>
>>>>> str(swmp.lu)
>>>>
>>>> 'data.frame':      720 obs. of  70 variables:
>>>>
>>>> $ station      : chr  "gtmfm" "gtmfm" "gtmfm" "gtmfm" ...
>>>>
>>>> $ smy          : Factor w/ 766 levels "gtmfm_1_2002",..: 2 3 4 5 6 7
>>>> 8 9 10 11 ...
>>>>
>>>> $ Temp_2       : num  13.3 15 16 16.3 17.2 ...
>>>>
>>>> $ SpCond_2     : num  49 53.4 50.4 50.3 54.2 ...
>>>>
>>>> $ Sal_2        : num  32 35.3 33.1 33 35.9 ...
>>>>
>>>> $ DO_mgl_2     : num  7.96 7.57 7.49 7.88 7.84 ...
>>>>
>>>> $ Depth_2      : num  2.49 2.58 2.71 2.53 2.69 ...
>>>>
>>>> $ pH_2         : num  8.23 8.24 8.22 8.41 8.18 ...
>>>>
>>>> $ Turb_2       : num  6.18 2.27 10.77 4.25 13.87 ...
>>>>
>>>> $ DO_Pct_2     : num  92.6 93.1 92.4 98.2 100.8 ...
>>>>
>>>> $ PO4F_2       : num  0.0065 0.0165 0.0035 0.0085 0.015 0.0105
>>>> 0.0095 0.012 0.0075 0.011 ...
>>>>
>>>> $ TDP_2        : num  0.0235 0.0145 0.0135 0.025 0.033 0.021 NA NA
>>>> NA NA ...
>>>>
>>>> $ TP_2         : num  0.0345 0.028 0.051 0.032 0.039 0.029 NA 0.069
>>>> 0.015 NA ...
>>>>
>>>> $ PHOSP_2      : num  0.011 0.0135 0.0375 0.007 0.006 0.008 NA NA NA
>>>> NA ...
>>>>
>>>> $ NH4F_2       : num  0.0205 0.0515 0.0475 0.064 0.035 0.024 0.0265
>>>> 0.025 NA 0.0255 ...
>>>>
>>>> $ NO2F_2       : num  0.0014 0.00175 0.0039 0.0041 NA NA NA NA NA
>>>> 0.00195 ...
>>>>
>>>> $ NO3F_2       : num  NA 0.0045 0.0085 0.0014 NA NA NA NA NA 0.00955 ...
>>>>
>>>> $ NO23F_2      : num  0.0014 0.00625 0.0124 0.0055 0.01245 ...
>>>>
>>>> $ DIN_2        : num  NA 0.058 0.06 0.0695 0.047 0.034 0.0325 0.032
>>>> NA 0.037 ...
>>>>
>>>> $ TDN_2        : num  0.181 0.152 0.137 0.329 0.244 ...
>>>>
>>>> $ TN_2         : num  0.281 0.203 0.186 0.354 0.274 ...
>>>>
>>>> $ PN_2         : num  0.1003 0.0506 0.0495 0.0253 0.0296 ...
>>>>
>>>> $ UncCHLa_N_2  : num  1.9 1.1 2.1 0.7 1.8 1.9 2.85 5.3 1.9 3.3 ...
>>>>
>>>> $ CHLA_N_2     : num  0.25 1.05 NA 0.8 1.55 1.6 2.4 4.8 1.8 2.8 ...
>>>>
>>>> $ PHEA_2       : num  1.05 0.155 NA 0.55 0.5 0.7 0.85 1.05 0.3 0.95 ...
>>>>
>>>> $ POC_2        : num  0.462 0.925 2.255 0.915 0.455 ...
>>>>
>>>> $ SiO4F_2      : num  0.378 1.891 1.347 0.448 1.092 ...
>>>>
>>>> $ TSS_2        : num  36 21.5 36 25 83.7 ...
>>>>
>>>> $ WTEM_N_2     : num  13.4 14.3 19.4 15.3 18.7 16.7 15.4 10.6 13.8
>>>> 17.6 ...
>>>>
>>>> $ SALT_N_2     : num  31.2 33.5 31.7 30.9 32.3 32.8 32.7 28.2 34.5
>>>> 33.4 ...
>>>>
>>>> $ DO_N_2       : num  8.8 9.8 7 8.2 7 7.8 8 5.9 10.4 7.2 ...
>>>>
>>>> $ PH_N_2       : num  8 8.2 7.9 8.1 8 7.8 7.9 NA 6.5 NA ...
>>>>
>>>> $ TURB_N_2     : num  2.9 2.75 3.75 3 5.1 3.55 3.7 0.8 1.95 7.4 ...
>>>>
>>>> $ SECCHI_2     : num  1.6 1.9 1.3 1.5 1.5 1.8 1.8 1.5 2 1.8 ...
>>>>
>>>> $ IRR0_N_2     : num  251 125 1212 684 762 ...
>>>>
>>>> $ IRR1_N_2     : num  26.8 26.4 366.2 183.8 236.6 ...
>>>>
>>>> $ Kd_N_2       : num  2.24 1.55 1.2 1.31 525.2 ...
>>>>
>>>> $ COLOR_2      : num  NA 4 4.5 NA 6.75 ...
>>>>
>>>> $ TKN_2        : num  NA NA NA NA NA NA NA NA NA NA ...
>>>>
>>>> $ TON_2        : num  NA NA NA NA NA NA NA NA NA 0.474 ...
>>>>
>>>> $ FECCOL_CFU_2 : num  NA NA NA NA NA NA NA NA NA NA ...
>>>>
>>>> $ ECOLI_MPN_2  : num  NA NA NA NA NA NA NA NA NA NA ...
>>>>
>>>> $ month        : num  1 1 1 1 1 1 1 1 1 1 ...
>>>>
>>>> $ year         : num  2003 2004 2005 2006 2007 ...
>>>>
>>>> $ Prct_1000    : num  18.4 18.4 18.4 18.4 18.4 ...
>>>>
>>>> $ Prct_3000    : num  6.07 6.07 6.07 6.07 6.07 ...
>>>>
>>>> $ Prct_4000    : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Prct_5000    : num  19.1 19.1 19.1 19.1 19.1 ...
>>>>
>>>> $ Prct_6000    : num  56.4 56.4 56.4 56.4 56.4 ...
>>>>
>>>> $ Prct_7000    : num  0.207 0.207 0.207 0.207 0.207 ...
>>>>
>>>> $ Prct_8000    : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Prct_9000    : num  11.9 11.9 11.9 11.9 11.9 ...
>>>>
>>>> $ PoorlyDrained: num  2e+06 2e+06 2e+06 2e+06 2e+06 ...
>>>>
>>>> $ WellDrained  : num  64587 64587 64587 64587 64587 ...
>>>>
>>>> $ KnownSewer   : int  50 50 50 50 50 50 50 50 50 50 ...
>>>>
>>>> $ KnownSeptic  : int  4 4 4 4 4 4 4 4 4 4 ...
>>>>
>>>> $ LikelySeptic : int  25 25 25 25 25 25 25 25 25 25 ...
>>>>
>>>> $ Residential  : int  141 141 141 141 141 141 141 141 141 141 ...
>>>>
>>>> $ Commercial   : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Industrial   : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Institutional: int  14 14 14 14 14 14 14 14 14 14 ...
>>>>
>>>> $ UtilTrans    : int  1 1 1 1 1 1 1 1 1 1 ...
>>>>
>>>> $ ParksRec     : int  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Res_SqM      : num  450947 450947 450947 450947 450947 ...
>>>>
>>>> $ Comm_SqM     : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Indust_SqM   : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ Instit_SqM   : num  1294539 1294539 1294539 1294539 1294539 ...
>>>>
>>>> $ UtilTrans_SqM: num  103 103 103 103 103 ...
>>>>
>>>> $ Waters_SqM   : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>> $ ParksRec_SqM : num  0 0 0 0 0 0 0 0 0 0 ...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Tricia Kyzar
>>>>
>>>> Ph: 352-392-7260
>>>>
>>>> Email: [log in to unmask]
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of
>>>> Klarenberg,Geraldine
>>>> Sent: Sunday, December 2, 2018 2:19 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> Also - FYI setting na.rm to TRUE, it does exactly what you indicate
>>>> you want it to do, it ignores the values in the calculations. It
>>>> doesn't actually remove them from your dataframe.
>>>>
>>>>
>>>>
>>>> On Dec 2, 2018 13:54, "Kyzar,Tricia E" <[log in to unmask]> wrote:
>>>>
>>>> Thank you for replying Geraldine.  I will admit I’m a bit unsure how
>>>> to set these options up.  I ‘thought’ that setting na.rm=FALSE meant
>>>> that it would not delete the rows where there are NA’s, because I do
>>>> want to keep the rows, but ignore those ‘cells.’
>>>>
>>>>
>>>>
>>>> I’ve also tried running it the way you suggest and still get the
>>>> same error message:
>>>>
>>>>> lm.fit=lm(TN_2~., Subs_TN, na.rm=TRUE)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>> Thank you for any help!
>>>>
>>>>
>>>>
>>>> Tricia Kyzar
>>>>
>>>> Ph: 352-392-7260
>>>>
>>>> Email: [log in to unmask]
>>>>
>>>>
>>>>
>>>> From: UF R Users List <[log in to unmask]> On Behalf Of
>>>> Klarenberg,Geraldine
>>>> Sent: Sunday, December 2, 2018 1:47 PM
>>>> To: [log in to unmask]
>>>> Subject: Re: lm() with NA's
>>>>
>>>>
>>>>
>>>> In the first version, have you tried setting "na.rm=TRUE"? Because
>>>> in the way you wrote it, you're telling R to keep the NAs, which is
>>>> not what you want (na.rm stands for "remove NA", so if you set it to
>>>> TRUE, you are removing them).
>>>>
>>>> However, the error message seems to be related to factor levels, so
>>>> there might be a different problem there. The error message implies
>>>> you don't have more than 1 level?
>>>>
>>>>
>>>>
>>>> Geraldine
>>>>
>>>>
>>>>
>>>> On Dec 2, 2018 13:24, "Kyzar,Tricia E" <[log in to unmask]> wrote:
>>>>
>>>> I have a dataset: Subs_TN – this is a subset of my full dataset, in
>>>> this subset there are no NA’s for my dependent variable (TN_2)
>>>>
>>>> But there ARE NA’s in places in all my other predictor columns.
>>>>
>>>>
>>>>
>>>> I want to run a linear regression model using all of my predictor
>>>> columns (this is a real world dataset and sporadic NA’s will always
>>>> be there), but I’m having problems figuring out how to work around this.
>>>>
>>>>
>>>>
>>>> Below are examples of how I’ve set up the lm() function and the
>>>> errors I’ve gotten back.  How can I perform linear regression when
>>>> there are NA’s in my predictor variables?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> lm.fit=lm(TN_2~., Subs_TN, na.rm=FALSE)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>>> lm.fit=lm(TN_2~., Subs_TN, na.action=na.exclude)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>>> lm.fit=lm(TN_2~., Subs_TN, na.action=na.omit)
>>>>
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>
>>>>  contrasts can be applied only to factors with 2 or more levels
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thank you in advance for your advice!
>>>>
>>>> ~ Tricia Kyzar
>>>>
>>>>
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> This list strives to be beginner friendly. However, we still ask
>>>> that you PLEASE do read the posting guide
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=zY1ZV8u5WaHIZMG6aJzxLbBQKG1fPgRXC37hJBonHkA&s=J2j_w9I9v6II0bDcVsgZlCFliYC5UiwknswJbpQHVwM&e=
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> This list strives to be beginner friendly.  However, we still ask
>>> that you
>>> PLEASE do read the posting guide
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=qAn3Rfw8U3lg9nI7C8Nc4g&m=MwfD1PNr8wPyPSJU9b-dqG1tSB0JVhN2v7JmNoodVVI&s=feIW6plIy2HVUIx-pMYbUZ12r9wJ9y4Le044sW-iHkE&e=
>>> and provide commented, minimal, self-contained, reproducible code.
>>> This list strives to be beginner friendly. However, we still ask that
>>> you PLEASE do read the posting
>>> guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=NBTA1X9TdGCjDNdAFb_aFlYCOe4zhQzSbKdDMqDm-Vg&s=rWVQfjjYSB8zcejeN3VCBglo0UpmhSfa7PFXyzgGDeE&e= and
>>> provide commented, minimal, self-contained, reproducible code.
>>
>> This list strives to be beginner friendly.  However, we still ask that you
>> PLEASE do read the posting
>> guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=7gEOM6b8fm83cLFdkgrs4VIyYOpVLuiCCzJGUaI5vko&s=3Iqi5xKOTd-Ld8lput2VgDQ08WojUX5TKDGICJGR0EI&e=
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.r-2Dproject.org_posting-2Dguide.html&d=DwMGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=3MJaagrtDUL79e1ACXZji8lYXjs07_ujkEIg1it6gQ0&m=UdDEcPchFvytx1N1KYnKj66_YoN_u2m8LyaqlT3DkcY&s=j-mwbA1lI9g7_yAifm5JFRdOpnBzfI-ekvrv-70HDWM&e=>
>> and provide commented, minimal, self-contained, reproducible code.
> 
> This list strives to be beginner friendly. However, we still ask that
> you PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIDaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=QWxQId63qB2iSP1ggUL7kQsdfEWUTu6qCGEw8Xuo91A&m=7gEOM6b8fm83cLFdkgrs4VIyYOpVLuiCCzJGUaI5vko&s=3Iqi5xKOTd-Ld8lput2VgDQ08WojUX5TKDGICJGR0EI&e=
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=3MJaagrtDUL79e1ACXZji8lYXjs07_ujkEIg1it6gQ0&m=UdDEcPchFvytx1N1KYnKj66_YoN_u2m8LyaqlT3DkcY&s=YbmNXeeg7wXvf1R-WbYud7uYaEp0Y9G0cy9InEGjn_Y&e=>
> and provide commented, minimal, self-contained, reproducible code.

This list strives to be beginner friendly.  However, we still ask that you
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008

ATOM RSS1 RSS2



LISTS.UFL.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager