Here are a few important things to keep in mind when diving into your next data analytics project:
1. Never assume that your dataset is clean
Clean, clean, clean. Data cleaning is the process of uncovering and correcting, or potentially eliminating altogether, inaccurate or repeat records from your dataset. It is imperative that you work through this step first before beginning any analysis. This is particularly important if you will be presenting your findings to business teams who will be using it for decision making purposes. Teams need to have confidence that they are acting on a reliable source of information.
2. Start with a specific question and hypothesis
Once you’ve completed the cleaning process, you may find yourself looking at the dataset with a number of questions swirling in your head. There is so much potential analysis in front of you! Just be cautious and proceed slowly. Don’t try to tackle too much at once. Make sure that you are beginning your analysis with a very focused and specific question.
If the request for analysis is coming from a business team, push them to provide explicit details around what they are hoping to learn, what they expect to learn, and how they will use the information. This might also mean that you can actually eliminate some unnecessary variables in the dataset to make sure that your analysis remains on track.
3. Don’t be biased by having a hypothesis in mind
Fantastic, your dataset is clean and you have narrowed in on a specific question! Next, make certain that you remain unbiased as you make inroads with your analysis. Many analysts will tell you that it can be tempting to use data to tell the story that you or your colleagues want or expect to hear.
But, you have to let the dataset speak for itself. Keep yourself alert to the fact that maintaining objectivity is not as easy as it sounds. It’s okay if the data isn’t telling you what you are expecting to hear because that’s a finding in and of itself!
4. Documentation is key
I find it useful to retain a log of my data analysis for various reasons. First, it is a place where I can discuss any limitations or special circumstances that I encountered along the way. Second, a colleague can more easily review and/or critique the analysis by having this guide.
Finally, in the case that my analysis will be replicated with an updated or new dataset, then I can be confident that it is conducted in a way that allows for a true comparison with the prior work.
5. Always investigate the whys
Finally, as you near the conclusion of your analysis, remember that this dataset is only one piece of the puzzle. It is critical to pair your quantitative findings with qualitative information, which you may capture using questionnaires, interviews, or testimonials. While the dataset has the ability to tell you what is happening, the qualitative information can often point you in the direction of why it is happening.