Stream-Processing In A

High Traffic Environment

From Startup To Post-Acquisition

Daniel Pezely
Formerly with Splunk/BugSense
(now with Snagz.net)

Data Pipelines & Distributed Systems Meetup
at STAT Search Analytics, Vancouver, BC

24 November 2016


Context

We’ll look inside that “cloud” icon from before and after BugSense
was acquired and became Splunk MINT

[image from About Splunk MINT]


Background

BugSense founded in 2011 in Athens, Greece

Total funding was USD $100k plus a hosting grant

Was #2 (behind Google) for crash reporting & analytics of mobile apps

Handled many billions of requests daily, non-stop from around the world

Acquired by Splunk in 2013 with headcount of 12
via connection made six months earlier at Erlang Factory SF 2013

The BugSense offering became a Splunk App– now called MINT:
Mobile Intelligence Splunk.com/mint

( Splunk also acquired Metaphor Software in 2015,
maintaining an office presence in Vancouver )


Bio

Daniel joined BugSense after its acquisition.
Maintained Lethe database & stream-processing system
plus co-developed Data Collector, both of which
are presented here.

Involved in many startups and large companies:
Main Street, Wall Street, Silicon Valley, Sydney,
and places in between– now in Vancouver.

Currently:

Founder and principal developer of Snagz.net
document and data-mining system for datasets that
may benefit from more than Machine Learning alone.


Part 1:

Original BugSense Architecture

Launch through early days of acquisition

( Before BugSense gets re-written as a Splunk App )


Criteria for LetheDB

Handle crash reporting & analytics of mobile devices:


LetheDB

We’ll focus on the box highlighted:


Design & Implementation


Key Feature: Multi-Tenancy

One Erlang “process” per customer:

One set of tables per customer:

Per-customer tables persisted for N+1 days(*) of retention:

Originally tracked full history:

(*) N+1 days of retention because of modulo naming scheme for re-using files on disk and “atoms” within Erlang VM. One day of padding keeps semantics of midnight/date-rollover simple, considering late arrival write versus read for report generation.


Part 2:

New Architecture After Acquisition

BugSense re-written as a Splunk app


Criteria for Data Collector


Data Collector


Design & Implementation

First version (MVP) was single-tenant, second added multi-tenancy:

Processes within bundle:(*)

More Erlang, better Erlang:

(*) In Erlang, each “process” gets one mailbox. Adding a companion process can help with timely responses, such as when the other blocks on I/O.


Key Feature: Pipeline With Multiple Priorities

Many systems only offer uni-directional queue
because these are simpler to implement
due to avoiding an entire class of dead-lock scenarios.

But that’s for general purpose systems!

You know more about your subject domain
than the library or framework author:

Consider a custom Bi-directional queue/mailbox approach:


Part 3:

Lessons Learned


Founder’s Perspective

Build a deployable Minimum Viable Product (MVP)
Or as some call it, “minimum valuable product”

Plan to be #FundedByRevenue, then more funding simply helps you
grow faster and into more markets

(From a tech co-founder’s perspective, this is similar to
designing without a commercial Load Balancer — knowing that
you can always add one for more headroom later.)

also

Hire employees who are entrepreneurial-minded

Being entrepreneurial doesn’t necessarily mean that this person will be CEO. It simply implies a mindset of:


Product Manager’s Perspective

It’s all about managing complexity without being complicated:

Build versus buy versus use open source:

Consider that as an early-stage startup, you are continually
discovering the problem space as well as the solution space:

Other non-tech criteria to consider:


Software Developer’s Perspective

Simple layers yield rich behaviour:

“Let it crash” used with great success:

Ability to mirror production traffic was huge win:

Programming Methodology:

Use of “exotic” programming languages can be strategic advantage


Multi-Tenancy For Multiple Priorities

Combining the two systems architectures…

Multi-tenant mechanics may also be used for priority messaging!

Instead of managing as customers or as message versus Out-Of-Band (OOB),
Augment with attributes such as from Service Level Agreement (SLA)

Offers more knobs & levers for scaling or controlling capacity

Not just for scaling up… but also scaling down later:


For More Information

Full series of original presentations:

  1. http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the-tsunami-of-mobi.html
  2. http://www.erlang-factory.com/conference/SFBay2013/speakers/DionisisKakoliris
  3. http://www.erlang-factory.com/sfbay2014/jon-vlachogiannis
  4. http://www.erlang-factory.com/sfbay2015/daniel-pezely
  5. http://www.erlang-factory.com/sfbay2016/panagiotis-papadomitsos
  6. https://github.com/priestjim/gen_rpc

Much credit for LDB’s and Data Collector’s success goes to:
Panagiotis “PJ” Papadomitsos Linkedin.com/in/priestjim@priestjim

Founders of BugSense:

BugSense is now Splunk Mobile Intelligence (MINT)
Splunk.com/mintdocs.splunk.com/Documentation/Mint