How do you tackle an open ended data analysis request? An Engineers approach

3 minute read

Find something “interesting” in this dataset and present it back to us.

When you get given a (new) dataset and are told to find something “interesting” in it, where do you start?

If you’ve ever been in this catch-22 situation this is either a wonderful thing to hear, or, your worst nightmare.

Wonderful because you now have free rein to see what is hiding in the data. Or.

Worst nightmare because:

  • you don’t know where to start,
  • there is no definition of what “interesting” is,
  • it is not clear when you’ll ever be done, and
  • what if there is something interesting but you miss it?

There are so many paths you could go down, metrics and visuals you could come up with.

How can you be sure you’re going to get this right?

The temptation might be to find any kind of answer quickly.

You might look at the data you’ve been given and begin pulling all kinds of metrics or charts together in the hope that these will be deemed interesting.

I think this is quite common as it gives us the feeling we are making progress and can at least show we are doing something against this abstract request.

If you don’t happen to find anything (the team think is) “interesting” then perhaps you could argue that it isn’t your fault. You’ve done everything you can with the data you were given and anyway they should have been more specific.

Sometimes, you get lucky this way.

Other times you don’t.

If you don’t, then whilst you might (rightly) feel the team should have been (much) better at defining what “interesting” was, perhaps you do wonder if there might just have been a better, more constructive way?

What if you had total clarity of what success looked like from the outset?

What if there was no doubt what you were actually trying to achieve?

What if you could be certain what “interesting” actually looked like and could even help your team recognise that, even if they themselves don’t even realise it yet?

With a crystal clear objective, you could focus all your energies on the right areas, confident that if there was anything interesting, you’d find it.

Where I would start with this is to think of the wider context first and specifically how everything interacts together.

Regardless of what you’ve specifically been asked to investigate I find it critical to clarify how everything fits together, what is deemed as a good outcome and what is deemed bad.

To often I think things get taken for granted and people, on all sides, end up disappointed.

As a consequence it is worth restating all your assumptions, just to be sure you are not missing anything.

For example you might clarify:

  • More profit is a good thing, right?
  • How are we calculating profit?
  • Is there ever a time when more profit isn’t good thing?
  • More regional growth is better, yeah?
  • Faster deliver is what we’re after, isn’t it?

Perhaps it sounds odd but what I’m effectively trying to do with this approach is get a pre-commitment of what the team think is really “interesting” or not.

Through this line of simple questioning you can unearth many hidden assumptions which will save you no end of time when you come to actually analysing the data.

Reasoning from First Principles.

In engineering we refer to this approach as looking at things from “First Principles.”

Personally, I’ve found it perhaps one of the most powerful approaches you can take in problem solving.

It has enabled me to walk into any number of new situations and organisations with confidence.

Seems I’m not alone, as in researching this article I found this quote from Elon Musk:

Well, I do think there’s a good framework for thinking. It is physics. You know, the sort of first principles reasoning. Generally I think there are — what I mean by that is, boil things down to their fundamental truths and reason up from there, as opposed to reasoning by analogy.
Through most of our life, we get through life by reasoning by analogy, which essentially means copying what other people do with slight variations.

Elon Musk, TED Curator Talk

I’d therefore suggest rather than thinking “what metric can I come up with from this data set” switch your thinking to first trying to really understand what is going on, what is sensitive and what is good/bad your own organisational situation.

To produce anything interesting for your team, you therefore need to have a solid grasp of the fundamentals of the challenge at hand.

For example, consider a typical business.

If you’re presented with a dataset containing transaction data, try coming up with fundamental principles that effect the business, irrespective of the data.

Nothing is too “obvious”, that is the thinking trap you’re looking to avoid.

Clarify if the business fundamentally exists for-profit. If it does, then showing profit metrics (sale price minus cost(s)) are always going to be interesting.

Does the business consider their growth or decline important? If so, showing growth can be different for different organisations so clarify what. It might be volume numbers (like no of sales of product lines) and/or various revenue metrics (e.g. revenue per type of product, broken down per region.)

What costs are the businesses are concerned with? So what operational costs are there, and what are they sensitive too (i.e. volume, region, season)? These figures will be interesting if they unearth something the team didn’t know, or even if they just clarify what the team already expect.

Interesting can be confirmation of whats known too!

Levelling up. A business could be looking to benchmark it’s performance against other competitors. To do this you can start to combine the profit and cost measures and generate an efficiency measure (such as profit % by volume). Do this and you’ve created something else interesting.

If you don’t have a clear spec, ask questions from first principles to create your own.

Once you have your fundamentals, have your data and metrics you then need to present this in an interesting way.

When presenting the data, people are interested in stories.

Stories manifest in data through trends, changes and outliers. If you can benchmark with your data then stories can show status, aspirations and direction.

The kind of story that might be possible to find in transaction type business data could be:

Asia loves product x since last year but Europe now hates it. That is great in one way but it takes a lot longer and costs the business more to ship that product to Asia.

To then make an actionable recommendation you can implicitly or explicitly suggest:

I therefore think Mr CEO that we should consider setting up an Asia hub to reduce our shipping time and costs. This could have x positive impact on Asia market sales and y positive impact on profitability through reduced shipping costs even accounting for z setup costs and even us holding 2x levels of stock. Look here in the data …

“Interesting” translates to thought through analysis and relevant suggestions.

By thinking of things from first principles, we are guaranteed to be providing relevant suggestions and thought through analysis.

This is because we are starting out thinking from a meaningful perspective.

We avoid simply generating numbers and metrics for the sake of it, in the hope that someone senior will work it out for themselves.

As always, I believe it is our job, as data analysts, to help the team see what is valuable in the data.

Taking a first principles approach ensures interest by keeping assumptions to a minimum.

What works for you? How do you approach the “find something interesting in the data” question? Reach out to me at hello@yourdatadriven.com and let me know.