- Currently a Data Scientist at Trinity Health
- Previously a Data Analyst for a large physician practice
- MA Economics (Econometrics, Environmental Econ)
July 9, 2015
Maximally Accurate
Maximally Interpretable
| lhs | rhs | support | confidence | lift | |
|---|---|---|---|---|---|
| 16 | {b,c} | {d} | 0.4 | 1 | 2.5 |
RHS (Outcome) = {d}
LHS (Inputs) = {b,c} LHS occurs in 40% of total population.
RHS occurs in 100% of these transactions, which is 2.5 times the population at large.
apriori(data,
parameter = list(minlen=1,
support=0.05,
confidence=0.4,
maxlen=3),
appearance = list(rhs=outcomelist,
default="lhs"),
control = list(verbose=T))
490k records from New York City resturant inspections
requires readr, dplyr, tidyr, arules, arulesviz (https://data.cityofnewyork.us)
Fields of interest:
## 3. Use tidyr::gather to put data in 'long' format
nycs <- nycw %>%
select(INSID, BORO, CUISINE_DESCRIPTION, VIOLATION_TYPE) %>%
gather(MEASURE, VALUE, -INSID)
nycs$MEASURE <- nycs$VALUE
nycs$VALUE <- 1
## 4. Replace keys and measures with integer IDs
ID <- unique(nycs$INSID)
ME <- unique(nycs$MEASURE)
nycs$MEASURE<-match(nycs$MEASURE,ME)
nycs$INSID<-match(nycs$INSID,ID)
## 5. Convert to sparse matrix
sm<-sparseMatrix(i=nycs$INSID, j=nycs$MEASURE,x=nycs$VALUE,
dimnames=list(ID,ME),giveCsparse=T)
rules <- apriori(sm2,
parameter = list(minlen=1, supp=0.001, conf=0.4, maxlen=4),
appearance = list(rhs=outcomelist,
default="lhs"),
control = list(verbose=T))
outcomelist
## [1] "Food Handling" "Rodents/Pests" "Employee Practices" ## [4] "Temperature" "Adminstrative" "Poor Equipment" ## [7] "Food Source"
class(sm2)
## [1] "transactions" ## attr(,"package") ## [1] "arules"
30 rules were generated. The top 5 are:
| lhs | rhs | support | confidence | lift | |
|---|---|---|---|---|---|
| 23 | {BROOKLYN,Caribbean} | {Rodents/Pests} | 0.008 | 0.605 | 1.396 |
| 1 | {Delicatessen} | {Temperature} | 0.010 | 0.599 | 1.353 |
| 5 | {Caribbean} | {Rodents/Pests} | 0.016 | 0.570 | 1.315 |
| 35 | {MANHATTAN,Chinese} | {Temperature} | 0.015 | 0.575 | 1.299 |
| 2 | {Pizza/Italian} | {Temperature} | 0.012 | 0.556 | 1.257 |
plot(rules,method="grouped")
plot(rules,method="paracoord")
plot(rules,method="graph")
Christian Borgelt's website
Various publications and implementations of Association Rules
Efficient Analysis of Pattern and Association Rule Mining Approaches