Big Data is Dead - notes.a10y.dev

#reading-list **URL:** https://www.youtube.com/watch?v=lisIQ9ohU8g ## Summary - Presenter: Jordan Tigani, co-founder of MotherDuck (DuckDB). Former lead engineer on BigQuery, SingleStore - Since the 2000’s, Big Data has dictated the design of data systems. - Big Data as an idea is flawed for several reasons: - 90th percentile of users of BigQuery were using working set of <100MB - Scale out used to be cheaper than Scaling up. Now, baseline machines are 32x more powerful, and in cloud buying 2 machines is same price as buying one machine w/2x the power - Big Data is a liability financially (storing everything forever is scary) and legally (GDPR means you need to know everything in your warehouse. You need retention policies to avoid keeping confidential/sensitive data that could be subpoenaed). - As machines have gotten more memory to fit most working sets and standard hw becomes more powerful, moving data to compute is actually often cheaper now than moving compute to data. - DuckDB provides an analytic DB as a library. Users can embed this in scripts, run on laptop, run on cloud hosts, run on-prem, etc. The query engine makes use of the benefits of modern hardware such as SIMD, large memories, new columnar formats, to less expensive form-factors - **Duffy Editorial**: all of these things are true, and it’s great to get something that’s Spark-qualtiy as a query executor without needing all of Spark (which is heavyweight, even if you’re running a local cluster). But, most of the value is downstream of Big Data. DuckDB is only going to be valuable if it unlocks totally new ways of working with data–which things like WASM embedding that enable new form factors for distributing compute may do.