Preliminary Plots, Exploration, and Observations

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    15.0    27.0   130.1    75.0 10800.0
## Total dollars donated: 83459406
## Total count of donations:  641704
## Total number of donors:  109905

Nearly 110,000 Californians have made 641,704 contributions totaling about $83 million, with a median donation of $27 (sounds familiar!) and a mean donation of $130.

Who is getting the most donations?

Bernie Sanders has received the most individual donations by far, followed by Clinton, Cruz, Carson, and Rubio.

Clinton has accrued the greatest total donations, at nearly $40M. Sanders is a distant second with nearly $20M, followed by Cruz, Rubio, Bush, and Carson.

Filtering out candidates with less than $1M total donations, we get a little clearer picture.

I would like some sense of who to actually consider further. California is a very blue state, so I could ask plenty of questions about Clinton and Sanders. The Republican candidates are a little trickier as there are so many of them, and by the time our primary came around, Trump was the only one left on the ticket. So I guess I’d like to see how money was being contributed when the different republicans dropped their candidacy.

Looks like a lot of candidates dropped out around February, and no green or libertarian candidates dropped out, because each of those parties only had one to begin with.

How have donations come in over time?

This plot is very busy and hard to make sense of. I would like to overlay the withdrawals data to see what happens to donations when a candidate withdraws. Do they just slow down, or do they stop completely?

This plot is not any clearer. I want to zoom in.

These graphs indidate that when a campaign ends, donations stop immediately, or in some cases continue for a short time. I am surprised by how many candidates are in the top quartile, which is still pretty crowded and hard to interpret. Perhaps viewing it alone will help.

Several candidates clearly don’t belong in this group. Rand Paul, John Kasich, Ben Carson, etc. are nowhere near Clinton and Sanders. Perhaps the problem is grouping them by max day rather than average day.

Using average values distributes the candidates more evenly and makes the top quartile easier to read.

In the top quartile, Clinton and Sanders dominate the chart, which is no surprise. Perry, Rubio, and Bush all experience a slow-down in donations before withdrawing their candidacy. Cruz, however, shows a sudden spike just before dropping out.
Does this represent one large donation, or many smaller ones?

## Total number of donations:  5726
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      25      36     115     100   10800
## [1] "Dates he received the max donation: "
## [1] "2016-05-03" "2016-05-03" "2016-04-28"
## Total amount received on the day he resigned:  232258.3
## Max amount recieved on any day of his campaign:  232258.3

It seems that Californians really did not want Trump! Cruz received more money on the day he withdrew than any other day in his campaign. Most of that came from small donations.

It is still difficult to interpret the time graphs, so I will group the data by week and month to see if that clarifies anything.

In aggregating the data down by week, we can see that the democratic candidates experience more variation in contributions than the republican candidates. By month, it becomes clear that Sanders’ campaign peaked around March, which was tonly mnth where his donation exceeded Clinton’s. Both dropped after that, but Clinton’s began to rise again after April. This make sense, as April was around the time that it became clear that Sanders was probably not going to win the democratic primary.

Sanders vs. Clinton

I’m now going to limit my investigation to just Sanders and Clinton, since that’s what I’m most interested in, as well as, apparently, the vast majority of donating Californians.

How did donations track the primaries?

The above plot shows the daily cumulative donations to Clinton (blue) and Sanders (red) alongside their delgate counts (indicated as points). There doesn’t appear to be any strong correlation between wins and donations. Both candidates have recieved a pretty steady and predictable stream of money.

What about Trump?

Since ignoring Trump in the hopes he’ll go away has not proven to be an effective strategy, I’ll have a look at his data as well. I’m wondering if Californians have grown more or less supportive over time.

Trump’s support in California is growing, particularly since February, when many of his contenders dropped out.

Who is donating, and how much?

I’m going to zoom back out into the larger dataset and subset it to the largest campaigns for clarity. This will include the top quartile by daily total, as well as Donald Trump since he is the presumptive Republican nominee at this point.

## [1] "Summary of all campaigns:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    15.0    27.0   130.1    75.0 10800.0
## [1] "Summary of the big campaigns:"
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.00    15.00    27.00   124.20    53.55 10000.00

There doesn’t appear to be a great deal of difference between the donations recieved by the candidates recieving the most money, and the candidate pool as a whole.

How much do people typically donate? Do some candidates elicit larger donations than others?

The most common donation amount is $50 for all candidates, and people tend to round their donations to certain predictable values (eg. $25, $250, $5,000, etc.). Sanders has recieved far more small donations than Clinton, and it appears that the $27 marketing campaign was a success. Surprisingly, Sanders has also recieved more very large donations than Clinton, while the bulk of Clinton’s money has come in $100-$3,000 increments.

What about geography?

Unsurprisingly, donations are concentrated in the major metropolitan areas, with no distinuishable difference in pattern between count, total donations, and mean donation size. But I wonder how the pattern might change if I isolate the data by candidate or party.

These plots surprised me a little bit. I expected the donations for Clinton and Sanders to be concentrated in the major metropolitan areas (read: more liberal), but I am very surprised to see that Trump has also recieved the lion’s share of his money, albeit much less, from the San Francsico and Los Angeles regions. I had expected him to recieve more money from the more rural central valley. I also noticed that Sanders’ money has been better distributed thorough the state than either of his competitors.

Final Plots

Donations Over Time

This plot shows the campaign donations over time of the largest campaigns, with their drop-out date, if applicable, shown as a point. Most notable is Ted Cruz’s campaign, which recieved more money on the day he dropped out than any other day of his campaign. No other candidate recieved such a spike.
This plot also shows some remarkable spikes in Marco Rubio’s campaign occuring in July and october of last year. I am curious about these sudden influxes of money.

Donation Size

I chose this plot as a clear reflection of what social media has been telling me for months: Bernie Sanders receives far more donations than any other candidate, and the average donation amount is $27. I suspect that that marketing has made the average the mode, as this plot depicts. It would be very interesting to do a time series of different donation amount against the Sanders marketing campaign, however this would require another data set.

Donations by Location

This plot was really surprising to me. I went in with the assumption that Trump would be recieving most of his money from the more rural central valley. But to the contrary, like his democratic competitors, most of his money is coming from the San Francisco and Los Angeles areas. This makes me very curious about who these people are. I live in Silicon Valley and have not seen any support for Trump around my community or on my social media. However, the data indicate that Trump’s support, though smaller than the other candidates, is concetrated in my area.

Reflection

I really enjoyed this project, especially when the graphs offered insights that were not what I expected. It was neat to work with such a large data set and problem solve in areas that were totally new. Here were the most frustrating and worthwhile challenges:

Challenges and Solutions

  • Getting the data loaded properly
    • I can’t really recall why this was such a problem at this point, or what the solution was.
  • My column headers were wrong so I had to reset them all. I knew how to do this but had some sytnax issues.
  • Getting the date format correct took a great deal of trial and error.
    • The documentation for this is extremely simple and straightforward, but I kept getting a vague error messaage.
    • After what felt like endless googling and swearing, I put quotes around the format indicator (which none of the documentation or example showed) and that solved the problem.
  • Getting a count of donations by groups
    • The count function (I also tried tally) did not work for me. I did not take very much time to troubleshoot it.
    • I cheated and divided the sum by the mean to get count.
  • Getting the map of California bounded properly
    • Loading the map was relatively easy, as was looking up the appropriate coordinates
    • For some reason, the coordinates (which I verified outside of R) landed me somewhere in the Phillipines.
    • I triple checked my syntax, and there were definitely no mistakes in terms of how the coordinates were entered.
    • I ended up just mapping “California” and then using the zoom to show the appropriate area.
  • Setting up the data frame for the map by candiate
    • My original coordinates data frame did not have the city names. This worked fine for the first set of maps, but when I broke it down by candidate I had to be able to match the city to the coordinates.
  • It took me several tries to figure out how to do this, and due to the request limitations of the geocoord API, delayed my project a couple of days.

Successes

  • I am most proud of the map plots in this project. I had never worked with the maps API before, and was able to figure it out on my own using web resources. I ran into a lot of little problems, some described above, and managed to solve them all.
  • I also spent a lot of time thinking about how to best represent data with such a wide range of values. I used two different strategies: axis transformations, and splitting up the data into multiple plots. I feel that I used both effectively, and that all my plots are easy to read as a result.

Ideas for Further Investigation

  • I’m really curious about the apparent cyclical trend and spikes in the donations over time graphs. I would really like to see what correlates to the peaks. Is it social media trends? Certain hashtags or marketing? Do people tend to donate on a particular day of the week or month?
  • Who are the people giving money to the different candiates? I broke them down by the city they live in, but I overlooked the other information I had. Are there correlations by occupation or employer? For example, I was surprised to find that my assumption about where Trump’s supporters live is false. So I also wonder if some other stereotypes I’ve been holding are also false - white collar vs. blue collar, for example.
  • I really ignored the third party candidates in this investigation because they have recieved so little money. But it would be interesting to see where Stein and Johnson are getting their donations from.