Not sure if this is quite right, but you could trying skipping the loop and doing a dplyr group
_group with ID_LOT, and then mutate the new column using the cumsum function:
dat %>%
group_by(ID_LOT) %>%
mutate(Cum_prop = cumsum(Proportion_DEF))
This works with your reproducible example:
# Load package
library(tidyverse)
# Write example
dat < tribble(
~ID_LOT, ~YEAR, ~Proportion_DEF, ~cohort, ~lot_year, ~ID_HH, ~A_DEF_PERIOD,
1, 2015, 0.47, 2015, "1_2015", 89, 55.473,
1, 2016, 0.19, 2015, "1_2016", 89, 22.334,
2, 2015, 0.71, 2015, "2_2015", 89, 69.181,
2, 2016, 0.11, 2015, "2_2016", 89, 10.202
)
# Create cummulative sum
dat %>%
group_by(ID_LOT) %>%
mutate(Cum_prop = cumsum(Proportion_DEF))
Would that work?
Justin Millar
Original Message
From: UF R Users List <[log in to unmask]> On Behalf Of Cardoso Carrero,Gabriel
Sent: Tuesday, June 19, 2018 4:25 PM
To: [log in to unmask]
Subject: How to fill up a column using certain criterias
Good afternoon everybody
I’m having a hard time to fill up a column with the sum of certain values of other columns.
…..
The table has over 19,000 lines, because each YEAR I have a proportion deforested in a given LOT.
I created a new column with Cumulative_def_proportion, which I want to fill up with the SUM of cumulative deforestation proportion of i lot for a certain i YEAR.
Example:
ID_LOT YEAR Proportion_DEF cohort lot_year ID_HH A_DEF_PERIOD
1 1 2015 0.47 2015 1_2015 89 55.473
2 1 2016 0.19 2015 1_2016 89 22.334
3 2 2015 0.71 2015 2_2015 89 69.181
4 2 2016 0.11 2015 2_2016 89 10.202
What I want is that in a new column (Cum_prop), in line 2 would appear the cumulative proportion of deforestation of lot 1 up to 2016, which is 0.66, and in line 4 it would be 0.82 for lot 2.
I have in total 13 YEARs and lot cohort could be any of those 13 years.
The looping is running, but it returns the Cum_prop column empty=NA
BELOW IS THE SCRIPT I GOT SO FAR…I tried to recycle one script I had, but not sure if it is the same rationale or if I would have a simpler way to do that.
I think the problem is in filling the table because I’m not sure I am using %in% correctly or is.na
##########
dat= read.csv("Dados_final_cohortXY_prop.csv")
dat$Cum_prop= NA
v.years = sort(unique(dat$YEAR))
names(dat)
system.time( #calculates time used
for(i in 1:length(v.years)){
t.year = v.years[i] #selects year "i"
t.table = dat[dat$YEAR <= t.year, ] #creates a temporary table only with data up to year "i"
t.sum.lot = aggregate(list(prop.sum = t.table$Proportion_DEF), list(ID_LOT = t.table$ID_LOT), sum) #sum proportions of def per ID_LOT
prop.sum = t.sum.lot$prop.sum
length(t.sum.lot) #checking
#fill table
# where vector Cum_prop is NA and with lot within do vector “t.sum.lot" it will be the sum of sum.prop
dat$Cum_prop[dat$ID_LOT %in% t.sum.lot &
is.na(dat$Cum_prop)] = prop.sum
#remove objetos temporarios
rm(t.year, t.table, t.sum.lot)
#mostra o decorrimento do looping
print(paste(i, "/", length(v.years)))
} #end looping "i"
) #end system.time
##########
Thank you very much in advance
Gabriel
Gabriel C. Carrero
PhD Student
Department of Geography
University of Florida
(352) 8880234
P.O. Box 117315
This list strives to be beginner friendly. However, we still ask that you
PLEASE do read the posting guide http://www.Rproject.org/postingguide.html
and provide commented, minimal, selfcontained, reproducible code.
