battle
Posted by: Mo Hallaba, CEO

AI Data Analytics - Datawisp vs. the Big Players

Overview

With all the hype around AI and the fact that every data company uses the same exact words to describe its products, it can be really hard to distinguish marketing fluff from real performance. Just go to any data convention exhibit floor and you’ll see swaths of booths plastered with the same promises of making your company more “data driven” and “increasing efficiency.” Datawisp is no exception - admittedly we’re also guilty of using some of the same language as everyone else: “ask any question in natural language and get an answer in under five seconds.” Right.

So how well do these products actually work? We put Datawisp head to head with products made by some of the biggest companies in BI and shot them out ourselves to save you time (foreshadowing: this took an incredible amount of time). We included:

  • Thoughtspot Sage

  • Microsoft PowerBI + Copilot

  • Amazon Quicksight + Q

  • Databricks Assistant

These platforms all make the same promise of taking questions asked in natural language and delivering charts and tables, so we felt the comparison was a fair one.

We also tested Snowflake’s solution and attempted to test Tableau + Einstein (more on that later).

Here’s what we found.

Quantitative benchmark scores

Our benchmark has 30 different questions that range in complexity and clarity (how well the question is written) and that are accompanied by a series of matching datasets. We fed all the necessary data into each platform then asked all 30 questions and graded the responses (we describe our methodology in full  later on). Here’s how the platforms performed:

ai-benchmark-results-2
How each competitor did in our benchmark
  • Datawisp: 11/15/3/1

  • Thoughtspot Sage: 10/5/13/2

  • PowerBI + Copilot: 5/4/15/6

  • Quicksight + Q: 4/2/19/5

  • Databricks Assistant: 5/4/20/1

Immediately, you’ll notice that Datawisp had both the most correct answers and, just as importantly, the fewest (by far) incorrect answers. While we don’t want to speculate as to how the other products work (they’re essentially black boxes), we have an idea of why performance looks like this, which we’ll get into in more detail.

Testing methodology

We built our benchmark with the help of a third party data scientist during the development of Datawisp to measure changes in performance resulting from tweaks to our AI logic. The questions we used were based on questions that we encountered while talking to clients, and in general, a higher benchmark score often led to better performance with real-life client data. Therefore we’re confident that this benchmark is a reasonably unbiased test of an AI powered analytics tool’s ability to answer client questions - even if we had a hand in developing it ourselves. 

Benchmark materials:

We manually graded each answer according to these criteria:

  • Very good: a perfect answer

  • Good: a mostly correct answer

  • Incorrect: a wrong answer

  • Error: The tool either gave no result or the question caused it to crash/error

Quantitative results analysis

Correct and semi correct answers

Datawisp had both the highest number of fully and partially correct answers. We believe that this is by design, because Datawisp was built with AI in mind to answer questions like these. The other platforms on this list are primarily data visualization tools meant to be used by data scientists - with AI later added on as an extra (this is apparent in many places, including usability and customer support which we’ll discuss later on). Many of these platforms expect data scientists to manipulate data using SQL before they bring it in for the visualization stage, and therefore the AI powering the data visualization tool doesn’t have the full set of features it needs in order to answer the questions completely.

The way the end user can slice and manipulate the data is somewhat limited in traditional BI applications.  Integrating AI into these tools means the AI is limited in the same way. The end result is AI that can save a BI user a few clicks but isn’t very powerful on its own.

Incorrect answers and errors

Datawisp has the fewest number of incorrect answers and errors as well. We think this is mainly due to three things:

  1. Logic: we’ve spent a lot of time iterating on our AI’s logic to make sure that it takes many of the same steps that a data scientist would take - evaluating the data, formulating a query, checking its work, etc. 

  2. Datawisp blocks: our proprietary blocks act as guardrails and ensure that any code the AI writes can produce a sensible output. Datawisp first takes the AI generated code and translates it into Datawisp blocks, then compiles those blocks into SQL and runs the query on the database. If either code fails to run, the AI can re-attempt to solve the issue slightly differently.

  3. Datawisp is designed for non-technical users who can’t tell the difference between a right and wrong answer - so it attempts to avoid incorrect answers even at the cost of delivering the best possible answer in some cases (why it has so many partially good results). 

At Datawisp, we think minimizing incorrect answers are key to building a reliable, trustworthy data platform that everyone across the enterprise can confidently rely on to answer questions.

We spent a lot of time testing all of these solutions and took notes throughout the process. Here’s some insights we collected about each platform while going through the benchmark questions:

Thoughtspot

Thoughtspot’s AI feels more native to the product, not “glued on” like the others on the list - even if the quality of the AI itself leaves something to be desired. The results were disappointing even for some very simple questions. Sage (the AI) tries to explain its logic but doesn’t do so as well as the other products and when it fails it doesn’t really provide any actionable next steps. 

 Example: What is the most common number of payment installments for payments over $500

thoughtspot
Here, Sage just decides to answer a different, much easier question for no apparent reason.

PowerBI

PowerBI seems to favor erroring over giving wrong / incomplete answers - it had far more errors than the alternatives, although at least its error messages were understandable. 

Example: Which products are frequently bought together?

Power BI Q&A does not support identifying frequently bought together products directly. However, you can analyze the dataset to find patterns of products that are often purchased in the same order. This typically involves more advanced data analysis techniques such as market basket analysis or association rule mining, which are not supported here.

Due to technical limitations, such as only being able to join on one column and not supporting multiple aggregations, PowerBI was unable to answer more complex questions, and this was reflected in its score.

Quicksight

The most disappointing platform we tested. Its weak AI (Q) was unable to meaningfully transform data, offered very little help getting datasets organized and joined, and created only passable charts when it did get an answer. The AI does share its interpretation of the question asked, but most of the time the only use for that is to see how poorly Q understood the question:

Example: How many different products have a freight value more than half of the price?

Interpreted as: Unique number of Product Id where Freight Value is more than 50.

Without even understanding the question, answering it becomes impossible.

Databricks

Databricks Assistant is targeted at technical users, and seems to favor returning a false result rather than erroring - presumably because it assumes data scientists can easily identify wrong answers. This is okay for audiences who can reasonably check the work of the AI but awful for non-technical folks who may blindly accept the AI’s result, especially given the number of wrong answers. Joins must be done manually, which means that any question requiring a join is off limits to a non-technical user. 

The Databricks assistant performed the worst out of all the tools we tested, even failing to understand really simple questions. 

Example: Do different payment methods have different order values?

databricks
Here most competitors took an average of order value and grouped by payment method but Databricks Assistant simply summed total orders by method, which doesn’t answer the question.

User experience and other qualitative metrics

We also compared our experience of working with these tools relative to Datawisp and found they differ greatly in usability. We graded them 1 (bad) through 5 (great) on ease of getting started, usability, AI-made charts, cost / cost transparency, and support.

Thoughtspot Sage

  • Getting Started: 4

  • Support: 3

  • UI/UX: 2

  • AI Charts: 4

  • Cost/transparency: 4

  • Documentation: 3

Notes: Thoughtspot is the smallest company on this list. The product is cheaper and much easier to get started with than its competitors. However, its UI is not particularly intuitive (things move unexpectedly) and the AI is fairly weak. It answers very simple, straightforward questions well as long as your data is clean. Pricing is fairly reasonable and straightforward, and we never needed to talk to their support. Since there isn’t much friction, we recommend trying it to see if it works for you.

 PowerBI + Copilot

  • Getting Started: 1

  • Support: 2

  • UI/UX: 5

  • AI Charts: 2

  • Cost/transparency: 2 

  • Documentation: 1

Notes: PowerBI is free to download, but getting access to Copilot required much more work. After failing to turn it on ourselves, we reached out to support, which recommended a completely different product to us that didn’t meet our requirements (ML Workbench). When pressed further, we were sent to a 3rd party vendor, who quoted us a $5k+ option initially, before we were able to figure out a workaround and pay for Fabric and Copilot per hour through an Azure subscription. Unless you’re a big enterprise client and have an existing relationship with Microsoft, getting Copilot to work with PowerBI is a huge pain in the ass.

QuickSight + Q

  • Getting Started: 2

  • Support: 2

  • UI/UX: 1

  • AI Charts: 3

  • Cost/transparency: 2

  • Documentation: 3

Notes: We cannot in good conscience recommend this to anyone. It’s difficult to set up and doesn’t work out of the box with other AWS products (RDS postgres); we had to meet with our AWS rep to ask for a trial, and our CTO had to build a workaround to get Quicksight to see our data in AWS. The AI performance is subpar and doesn’t make up for any of these other shortcomings. Support wasn’t particularly helpful and pricing is a mess of various monthly fees that enable certain parts of the product and other usage-based fees.

Databricks Assistant

  • Getting Started: 3

  • Support: 3

  • UI/UX: 4

  • AI Charts: 4

  • Cost/transparency: 3

  • Documentation: 2

Notes: Databricks requires a connection to a cloud provider on signup, and needs admin access to your cloud dashboard. Once this is done, getting started is fairly straightforward. Cost, however, isn’t very clear as it’s consumption based. Ask the wrong questions and your bill might skyrocket. But they do show you how to calculate cost.

Honorable mention - Snowflake + Copilot

Despite technically not meeting the criteria for selection (must be strictly natural language to insights BI tool), we ran Snowflake’s solution (Snowflake + Copilot) through our benchmark just out of curiosity. This is a more technical product intended to be used by data scientists and while it does take questions in natural language, its output is always SQL, not a table, chart, or number. Snowflake does give the option to run the SQL in a notebook, but it’s not done automatically.

Unsurprisingly, because of the text-to-SQL approach, Snowflake had the most correct answers with 16 (beating even Datawisp), and 3 partially correct. However, it had 11 total wrong/error answers compared to Datawisp’s 4. For data scientists who can quickly identify wrong answers, this is a great platform that can save a lot of time, and we’d recommend using it. However, we wouldn’t feel comfortable giving this to a non-technical person simply because of the error rate and because it lacks a code-free way of verifying the AI’s work (like Datawisp blocks).

Relative to Snowflake, Datawisp is a bit more conservative, but more reliable - perhaps the difference comes down to the intended audience.

An obvious limitation of Snowflake’s solution is that your data must exist on Snowflake. In that sense, it’s not really a BI tool.

Since we have an existing relationship with Snowflake, our rep was able to get us started with Snowflake’s Copilot fairly quickly, but it did require setting up a call where they tried to upsell us on various other features and “learn more about our business.” Pricing is a complex equation based on credit usage, and Copilot is wrapped into this.

The elephant in the room

Where’s Tableau in all  of this? Why didn’t we test the biggest name in BI?

Frankly, we tried. While we were able to get Tableau running just fine, Einstein (Tableau’s AI) is only available for enterprise clients. We spent over two weeks going back and forth with different folks at Salesforce from sales to tech support, and as of the writing of this article, we still haven’t heard back, despite being promised a trial.

salesforce
Still nothing as of today

TL;DR

The biggest takeaway here is that most of these products are simply traditional BI tools with AI sprinkled on. They just weren’t designed to be asked questions in natural language. Microsoft, Salesforce, etc. already have huge client bases that depend on these products, and they can’t freely make radical changes to them without upsetting a lot of folks.

Datawisp, on the other hand, was built specifically to be used in this way. It shows not only in the number of correct answers but also in the low number of errors and wrong answers. Text-to-SQL can be very powerful, as Snowflake showed us, but balancing performance and reliability is hard. We think we’ve built something incredible that businesses can rely on today, and will only continue to improve over time.