- Currently a Data Scientist at Trinity Health
- Previously a Data Analyst for a large physician practice
- MA Economics (Econometrics, Environmental Econ)
July 9, 2015
Maximally Accurate
Maximally Interpretable
lhs | rhs | support | confidence | lift | |
---|---|---|---|---|---|
16 | {b,c} | {d} | 0.4 | 1 | 2.5 |
RHS (Outcome) = {d}
LHS (Inputs) = {b,c} LHS occurs in 40% of total population.
RHS occurs in 100% of these transactions, which is 2.5 times the population at large.
apriori(data, parameter = list(minlen=1, support=0.05, confidence=0.4, maxlen=3), appearance = list(rhs=outcomelist, default="lhs"), control = list(verbose=T))
490k records from New York City resturant inspections
requires readr, dplyr, tidyr, arules, arulesviz (https://data.cityofnewyork.us)
Fields of interest:
## 3. Use tidyr::gather to put data in 'long' format nycs <- nycw %>% select(INSID, BORO, CUISINE_DESCRIPTION, VIOLATION_TYPE) %>% gather(MEASURE, VALUE, -INSID) nycs$MEASURE <- nycs$VALUE nycs$VALUE <- 1 ## 4. Replace keys and measures with integer IDs ID <- unique(nycs$INSID) ME <- unique(nycs$MEASURE) nycs$MEASURE<-match(nycs$MEASURE,ME) nycs$INSID<-match(nycs$INSID,ID) ## 5. Convert to sparse matrix sm<-sparseMatrix(i=nycs$INSID, j=nycs$MEASURE,x=nycs$VALUE, dimnames=list(ID,ME),giveCsparse=T)
rules <- apriori(sm2, parameter = list(minlen=1, supp=0.001, conf=0.4, maxlen=4), appearance = list(rhs=outcomelist, default="lhs"), control = list(verbose=T))
outcomelist
## [1] "Food Handling" "Rodents/Pests" "Employee Practices" ## [4] "Temperature" "Adminstrative" "Poor Equipment" ## [7] "Food Source"
class(sm2)
## [1] "transactions" ## attr(,"package") ## [1] "arules"
30 rules were generated. The top 5 are:
lhs | rhs | support | confidence | lift | |
---|---|---|---|---|---|
23 | {BROOKLYN,Caribbean} | {Rodents/Pests} | 0.008 | 0.605 | 1.396 |
1 | {Delicatessen} | {Temperature} | 0.010 | 0.599 | 1.353 |
5 | {Caribbean} | {Rodents/Pests} | 0.016 | 0.570 | 1.315 |
35 | {MANHATTAN,Chinese} | {Temperature} | 0.015 | 0.575 | 1.299 |
2 | {Pizza/Italian} | {Temperature} | 0.012 | 0.556 | 1.257 |
plot(rules,method="grouped")
plot(rules,method="paracoord")
plot(rules,method="graph")
Christian Borgelt's website
Various publications and implementations of Association Rules
Efficient Analysis of Pattern and Association Rule Mining Approaches