Print

Print


Hi Nihal,

It is not possible to read your example data in this format. But I think the problem here is:
1. dplyr or in general R doesn't like you to have spaces or other special characters in your column name. use 'colnames(d1)' to change them to something with no spaces
2. you don't refer to the column using $ notation when using dplyr functions: e.g. no 'd1$XXX'

I've generated a fake dataset here:
df <- data.frame(med = rep(c('A', 'B', 'C'), each=10), 
                 gen = rep(1:3, 10), 
                 rec = c(rep(1:2, 10), rep(3, 10)),
                 val = runif(30))

And here's how you can get the count of unique records for each combination of "med" and "gen":
df %>% 
  group_by(med, gen) %>% 
  distinct(rec) %>% 
  summarise(n())

Note that I would recommend using 'summarise(n=n())' instead of 'summarise(n())' because the latter will make the column name 'n()', i.e. special characters as column name and it is not very fun. Can also use 'tally()' instead of 'summarise(n())', they do the same thing.

You can also use the tapply function to achieve a slightly different result:
with(df, tapply(rec, list(med, gen), function (x) length(unique(x))))

This would give you a table in which each row is a level of med, and each column is a level of gen, and each cell is the count of the row-column (med-gen) combination.

Cheers,
Ben


-----Original Message-----
From: UF R Users List <[log in to unmask]> On Behalf Of El Rouby,Nihal M
Sent: Wednesday, July 11, 2018 4:59 PM
To: [log in to unmask]
Subject: R question

Dear All-



I have a data with genotype and medication exposure on repeated dates. I’m trying to table the counts of the genotypes for unique individuals in each medication group . I tried  several codes to summarize the data by genotypes and medications, but with no luck



I used summarize and group_by from dplyr



output<-d1 %>%

group_by(d1$`Med Order Display Name`,d1$`CYP2C19 Genotype`) %>% distinct(d1$`Record ID`)%>%summarise(n())



Another code I tried



with(d1, tapply(d1$`Med Order Display Name`, d1$`CYP2C19 Genotype`, FUN = function(x) length(unique(x))))



I appreciate your input on a direction I should take.







My example data is



Record ID



CYP2C19 Genotype



Med Order Display Name



3



*1/*1



pantoprazole (PROTONIX) injection 40 mg



3



*1/*1



pantoprazole (PROTONIX) injection 40 mg



3



*1/*1



pantoprazole (PROTONIX) EC tablet 40 mg



13



*1/*17



pantoprazole (PROTONIX) 40 MG Tablet Delayed Release



13



*1/*17



pantoprazole (PROTONIX) 40 MG Tablet Delayed Release



13



*1/*17



pantoprazole (PROTONIX) 40 MG tablet



13



*1/*17



pantoprazole (PROTONIX) 40 MG tablet



28



*1/*1



esomeprazole (NexIUM) capsule 20 mg



28



*1/*1



pantoprazole (PROTONIX) EC tablet 40 mg



28



*1/*1



pantoprazole (PROTONIX) 40 MG tablet



28



*1/*1



esomeprazole (NexIUM) capsule 40 mg



52



*1/*1



NEXIUM 40 MG Capsule Delayed Release



52



*1/*1



NEXIUM 40 MG Capsule Delayed Release



52



*1/*1



esomeprazole (NexIUM) 40 MG Capsule Delayed Release



52



*1/*1



NEXIUM 40 MG PO Capsule Delayed Release





I hope I can get an output like that





pantoprazole (PROTONIX) injection 40 mg



pantoprazole (PROTONIX) EC tablet 40 mg



pantoprazole (PROTONIX) 40 MG Tablet Delayed Release



pantoprazole (PROTONIX) 40 MG tablet



esomeprazole (NexIUM) capsule 20 mg



NEXIUM 40 MG Capsule Delayed Release



NEXIUM 40 MG PO Capsule Delayed Release



esomeprazole (NexIUM) 40 MG Capsule Delayed Release





*1/*2



xx



xx



xx



xx



*1/*17



xx



xx



xx



xx



*1/*2



xx



xx



xx



xx



inconclusive



xx



xx



xx



xx



*2/*2



xx



xx



xx



xx



*17/*17



xx



xx



xx



xx









This list strives to be beginner friendly.  However, we still ask that you

PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=chtnqDhqphE18P0OVQNI_w&m=vLd7KeCQgzSz5llXZZD6dN6PoOutNnq5uFN8XPwW4Tk&s=gMZFv7h1zgLEOWiinJQfRrKr_5-3vLPWgEmG41DucXU&e=

and provide commented, minimal, self-contained, reproducible code.

This list strives to be beginner friendly.  However, we still ask that you
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.