“I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” p.189 The complete Sherlock Holmes, Volume 2
A colleague once asked me what the best tools were for doing data analysis. Initially, I wasn’t sure what she meant and then realised that she wanted some kind of script to run and get an answer. Data arrives, you do what you need to do and then report on what you find by following a formula.
It doesn’t work like that and even when it’s easy, it’s not usually that simple.
There is an article in the Guardian about ‘How to be a data journalist‘ and when answering the question of where to begin they suggest the following:
“So where does a budding data journalist start? An obvious answer would be “with the data” – but there’s a second answer too: “With a question”.
I don’t agree with the above, for me it’s always about the question. When you don’t have a question but you do have data then you use it to find a question. It’s always about the question because otherwise what are you writing about and how will you know when you’re done?
The premise of the Guardian article is that ‘data journalists’ are compartmentalised and do certain things but to me it seems slightly backwards. You don’t walk around looking for numbers >> ‘The Guardian’s Charles Arthur suggests “Find a story that will be best told through numbers”‘ which sounds a bit like a data journalist is someone who walks around with a metaphorical hammer looking for just the right nail.
The implication being that if someone didn’t know how to use ‘data’ they wouldn’t be able to answer questions that involved analysing numbers? That just can’t be right. You have a question and you use any (within reason) means to answer it.
If the source of information is numerical data then there are certain skills you can use and some of them involve statistics, presentation, context or analysis. Other data can include text, documents, speeches, actions, music, stories etc.
When Sherlock talks about obtaining more data, he is not talking about numbers, although that may be part of it. He is talking about information on which to base a conclusion, to find a solution. I agree with providing knowledge of skills which are helpful: finding data, interrogating data, visualising data, mashing data. The latter concept is a new one to me but I will look it up later.
For now I’ll finish up with what I told my colleague when she wanted to know what to do: what is the question? what is the data? what do you want to do with it? how will you know when you’re done?