Takeaways from QCon London 2017 – Day 3


Here’s day 3. Day 1 can be found here and Day 2 can be found here.

The Talks

  1. Avoiding Alerts Overload From Microservices with Sarah Wells
  2. How to Backdoor Invulnerable Code with Josh Schwartz
  3. Spotify’s Reliable Event Delivery System with Igor Maravic
  4. Event Sourcing on the JVM with Greg Young
  5. Using FlameGraphs To Illuminate The JVM with Nitsan Wakart
  6. This Will Cut You: Go’s Sharper Edges with Thomas Shadwell

Avoiding Alerts Overload From Microservices

  • Actively slim down your alerts to only those for which action is needed
  • “Domino alerts” are a problem in a microservices environment — one service goes down and all dependent services fire alerts
  • Uses Splunk for log aggregation
  • Dashing mentioned for custom dashboards
  • Graphite and Grafana mentioned for metrics
  • Use transaction IDs (uses UUIDs) in the headers of requests to tie them all together
  • Each service to report own health with a standard “health check endpoint”
  • All errors in a service are logged and then graphed
  • Rank the importance of your services – Should you be woken up when service X goes down?
  • Have “Ops Cops” — Developers charged with checking alerts during the day
  • Deliberately break things to ensure alerts are triggered
  • Only services containing business logic should alert

How to Backdoor Invulnerable Code

  • A highly enjoyable talk of infosec war stories.

Spotify’s Reliable Event Delivery System

  • The Spotify clients generates an event for each user interaction
  • The system is built on guaranteed message delivery
  • Runs on Google Cloud Platform
  • Hadoop and Hive used on the backend
  • Events are dropped into hourly “buckets”
  • Write it, run it culture
  • System monitoring for:
  • Data monitors – message timeliness SLAs
  • Auditing – 100% delivery
  • Microservices based system
  • Uses Elasticsearch + Kibana
  • Uses CPU based autoscaling with Docker
  • All services are stateless — cloud pub/sub
  • Machines are built with Puppet for legacy reasons
  • Apparently, Spotify experienced a lot of problems with Docker — at least once an hour
  • Services are written in Python
  • Looking to investigate Rocket in future

Event Sourcing on the JVM

  • Event sourcing is inherently functional
  • A single data model is almost never appropriate, event sourcing can feed many and keep them in sync e.g:
  • RDMS
  • NoSQL
  • GraphDB
  • Kafka can be used as an event store by configuring it to persist data for a long time, however this isn’t what it is currently intended to do
  • Event Store mentioned
  • Axon Framework mentioned
  • Mature
  • Eventuate mentioned
  • Great for distributed environments/geolocated data
  • Akka.persistence
  • Great, but needs other Akka libraries
  • Reactive Streams will be a big help when dealing with event sourcing

Using FlameGraphs To Illuminate The JVM

  • Base performance on requirements
  • Flamegraphs come out of Netflix
  • Visualisation of profiled software
  • First must collect Java stacks
  • JVisual VM mentioned
  • Linux Perf mentioned

This Will Cut You: Go’s Sharper Edges

  • It is possible, in some cases, to cause Go to crash through reading (JSON, XML etc) without closing tags — it just tries to read forever (DOS attack)
  • Go doesn’t have an upload size limit, put your go servers behind a proxy with an upload size limit to mitigate this e.g NGINX, Apache HTTP
  • Go doesn’t have CSRF protection built-in, this must be added manually
  • DNS rebinding attacks may be possible against Go servers

That about wraps it up for my summary QCon London 2017.