Where to start with your new data project?

4 minute read

You’ve got a load of data and you now want to use it to help your organisation improve.

You’ve read the headlines and “Big-machine-data-learning-AI-science” holds the promise to deliver “intelligent predictions.” This is going to help your organisation make better decisions and in real time.

You appreciate that some of this is marketing spin but you’re sold on the promise. You might even have started doing something useful already, say gathering, organising and visualising your data in a spreadsheet or two.

However now you’ve started to dig a bit deeper, things are not so clear.

The marketing headlines make it sound so simple but is it?

Outside of spreadsheets, the technology choice is overwhelming. They all seem to offer the same things (or parts of things its not clear) and are priced in strange ways.

Determining where to even start with all this, and then the right process to follow, can be baffling, frustrating and almost impossible.

So where do you go from here? How do you approach this?

What if you could follow a few simple steps to get you up and running fast?

How much better would it feel to have clarity of direction, a clear definition of what success looks like and then to know you are heading along on the right track?

What if you knew what technology to use, what data to look at and all without having to get a(nother!) degree from MIT?

Well, first think of this as a project.

A project that will start simple, with great foundations, that you can then build on.

I’d then start by asking yourself these three questions:

1- What issues are you having that has meant you’re looking into this?

You might be looking to identify your most profitable product lines. You might be trying to identify and quantify seasonality. It might be that you’re looking to combine data from different silos for the first time to get an overview of what is really going on.

There are lots of reason behind why you might start a project like this. Be clear on the why’s:

Why this specific issue?
Why not those other specific issues?
Why now?

Do this and it will help all your future project decisions and help you judge your progress.

2- What data do you have and how is it stored?

Think about the data you’re storing. How is this done? Is it nicely structured in a table? Is it even on a computer? Could it be? How often is it collected? Does it have time stamps? How reliable is it?

Ask yourself these questions about your data to know where you can start.

Don’t panic if you don’t have the data you think you need straightaway.

We often find we can’t explicitly measure what we really want to know, so we have to use data we can get and try to infer what is happening.

In motorsport for example the temperature of the tyre is really important for performance. To measure it we use a probe that sticks into the tyre and measures the temperate just below the surface. We can’t put a probe in the tyre when the car is moving. We can see the tyre surface though. So we measure that with an infrared camera and then, based on previous experiments, infer what the tyre temperature just below the surface must be.

Map out what data you have and compare this back to 1) then think about whether you have, or could infer, the data you need to help you make your decisions. Keep it simple at this stage.

3- What have you got already – in terms of reports on your data – that you currently look at?

Clearly you are looking to do something new but start with what you’ve got and try to answer why it is not giving you what you need.

If you’re using Excel (or another spreadsheet) to pull things together but it isn’t giving you want you want, get those limitations clearly stated.

Quite often spreadsheets can do (much) more than people realise so before investing time and money in another technology it is always worth exploring fully what you can do with what is to hand.

I once built a reporting tool in Excel that needed to increase the precision of some GPS measurements by effectively flattening out the earth with some trick NASA equation! Don’t be too quick to dump your spreadsheet if it is what you have access to.

Once you’ve worked through these questions you’ll start to have a pretty good idea of what you’re trying to achieve.

The aim really is to define what success looks like for this stage of your project.

Despite your aspiration to utilise all this latest technology to solve all your issues overnight, the fact is that you might be able to do this a whole lot easier than that. You might be able to do this empirically.

Next step – Now imagine a timeline.

Think of:
a) things that have happened,
b) things that are happening and,
c) things that will happen

Step a) is where to start – aim to get a good understanding of what has happened.

This is (normally) harder than it sounds …

For the geeks, defining this gets quite important, but what I mean is can you look at meaningful data of say yesterday, last week, last month, last quarter etc.

If you can’t do this already then focus on that, focus on a).

Doing this might give you enough information (together with your experience) to start improving your human-power predicting ability.

Very few organisations need data in real-time, so if you’re just starting out, park b) until you get further into this. It’s really hard (and, top tip, never ask a data geek what real-time actually is …) Again, keep it simple for now.

If you’ve not done this much before, then it is amazing how powerful it can be to simply have relevant historic data layout out in front of you, on a clear chart, table or two. What makes a clear chart or table would be a good article for another day so I will leave it here for this one.

Let me know how you get on and whether these tips have worked for you too. Drop me a line at hello@yourdatadriven.com