-- You're doing analytics wrong -- AMA w/ CEO of Metabase
Hey everyone -
I’m the founder and CEO of Metabase and I’m here to shill our upcoming launch of our Embedding versions.
Over the last 10 years, we at Metabase have been focused on offering an easy and Open Source way for anyone in your company to run reports, make charts, and just generally have access to your company’s data on their own.
We’ve been used by over 70,000 companies and organizations of all sizes and shapes - from fortune 100 companies to local volleyball club.
Analytics is a multi-player game, and while analysts, data scientists, and other practitioners of the dark arts are important players, we’ve always emphasized the need for a normal person with a day job to be able to get what they need on their own time. Increasingly, we’re finding that companies are using us to deliver data and analytics to their customers, not just their employees. We’ve been working to make that easier and faster to do, and are calling the result “Metabase Embedding”.
Along the way, we’ve learned a lot by helping hundreds of companies embed analytics and reporting in their products. This often involves more than just a chart here or a dashboard there, but rather the ability for your customers to run their own reports on the data they generated in your application.
Ask me anything about Metabase, creating an Open Source Business, why Github stars are a silly vanity Metric (and what you should be chasing instead), embedding reporting in your application, or data in general.
-Sameer
Replies
What metrics define the success of an open source project, according to you?
Metabase
@vamsi_peri I'd say the most important target to hit if you're looking for a large, growing open source project is to grow the core set of people who are using you in production (ideally happily). Walking that backwards, you'd also want to keep track of the number of installations, and the rough number of people trying you out (from the number of downloads or docker hub pulls). Github exposes some of these metrics, Dockerhub another set, and you'll probably want to host releases on your own servers so you get accurate counts there as well.
If you're an application, then some kind of (opt-in) FE client tracking is also ideal so you get a sense of what in the world people are doing inside the application.
What would you do differently if you were starting the product from scratch today?
Metabase
@ramiro__nd The most straight forward thing is that since NLP is basically a solved problem, we'd lean heavily on natural language for querying (note: we're doing this now with our "Metabot").
Bigger picture, I'd have build at least an "preview" paid version of Metabase much earlier and validated it. Making a successful open source project is hard, making a commerical software company is hard, doing both is significantly more annoying. I think understanding how you'll commercialize (read: testing out approaches and validating them) is something you really need to have sorted out really early in the process. We got lucky in that a couple of our early big bets there ended up working, but those could have easily not worked out.
Purposeful Poop
(I dont know a ton about the landscape, so sorry if some of these questions are dubious!)
Pardon my ignorance, but when i think about open source analytics my mind goes to Plausible (only because i used it in the past), can you help me understand how you position yourself relative to them?
I'm currently using goatCounter on my side project as well (which is extremely lightweight). Again pardon my ignorance, but what is the "killer feature" i should want?
Thanks for the AMA!
Metabase
@catt_marroll This question is a really common one, and was the bane of our existence early on.
Metabase goes on top of a database and lets you run queries, make charts, dashboards, set up alerts, etc on the data inside the database. Plausible (and Google Analytics, Mixpanel, etc) are a combination of a front-end library to instrument your application, storage and much simpler querying tools to do what is called "event" or "usage" analytics. Think of it as a fairly specialized use case, while with Metabase you can slice and dice any data you have lying around.
Which role in this multiplayer game do you consider AI to play when it comes to analytics?
Metabase
@hervelabas Today, the extremely sloppy intern in the analytics department whose work needs to be triple checked.
In a few foundational model generations, I think it should end up replacing human analysts in the "I have a question" -> "here's the answer" loop, and probably in the "this thing is confusing, can you dig into what's happening?" problem as well.
What are the key differentiators of your embedding product and why should I pick it over other competitors?
Metabase
@maxzhengx Fundamentally, I think we're a lazier option than all the alternatives. If you do a bit of front-loaded work on sorting out what the customer facing schema should look like, you can get something in front of your customer in a week or two. Most people agonize over pixels and completely ignore the shape of their data, and we think (and nudge you in the product) to do the inverse.
What was the biggest “aha” moment you had while building Metabase Embedding?
Metabase
@_roman How often the thing that determines whether a given customer manages to ship, how long it takes them to ship, and then whether their customers use the in-app reporting hinges on the data schema and whether the average user of the application understand what they're looking at. Too often analysts or data engineers design schemas for other data practitioners, and don't really understand the frame of mind of a normal user of their application.
Todogs
How are you planning to support customer-driven custom analytics in a multi-tenant setup without overloading the shared database?
Metabase
@aojedal Database (or data warehouse) load isn't a magical thing we're going to solve in Metabase itself. At some point, you need to figure out what data sets to expose, what specific forms of questions you expect your users to ask, the indexing patterns of the row level per-customer data controls, etc.
At very low scale, having customers hit the application database probably makes sense. But at any realistic customer volume, you really should provision a separate database dedicated strictly to serving customer-facing analytics. Caching will help you for a bit, but it's a long term dead-end.
We strongly suggest treating in-app analytics as a customer facing feature. For customer facing features, performance is really important, and you should optimize the DB schema to support that. This almost certainly means moving away from your main application database and creating custom tables that map more closely to how your customers think about the objects in your application.
How do you think about building a business on open source code? Do you feel like you "lose" revenue to the self-hosted version? Or is the increased exposure "worth it"? Is that the right way to even frame it? What are the benefits and drawbacks?
Metabase
@steveb Fundamentally, you're giving away software that will cost you a lot to build in exchange for increased exposure, third parties that will be invested in your project and help (code or otherwise), and an increased amount of trust.
It's a difficult equation to balance, and historically most people came up negative.
For us, being Open Source has given us a lot in terms of credibility, talented engineers excited to work on the project and breadth of exposure. One thing that's made it all work is that there was a somewhat clear vision on what we were going to sell from early on. I do think you have to either be fully committed to being OSS or not at all. I haven't seen a lot of people who've waffled between the two (especially in the early years) succeed. There are a lot of pivotal early decisions you should be making very differently if you're Open vs Closed source (eg, with regards to user feature requests, support, etc)
What are some particularly challenging or gnarly things if I build customer facing analytics in-house?
Metabase
@vamsi_peri I mentioned in another comment that the jump from "I'll just add this one chart" to realizing you need to hand generic queries is the biggest jump in complexity.
The two other main difficult things are getting the schema you present to end users right and making the whole experience performant enough that it's not a punishment to your users.
what was your biggest headache?If you’ve tried to embed analytics
Metabase
@isha_nasir The biggest headache is that typically after putting in a single chart or two, you start getting requests from your customers about what data they need. It's extremely common for these requests to be extremely different, and all "urgent". At some point, you need to accept that you need to build a generic querying interface and this tends to be a really big jump in complexity and headaches.