Overview
In this class, you will write up data analyses where you are practicing one or more concepts that we have learned in the class. One of the questions that often arises is “What makes a good data analysis write up?”.
This is a fantastic question and it’s one of those things that once you have seen enough examples of good data analyses or once you have written enough of them yourself, it is easier to recognize. For those that are new(-ish) to writing up good data analyses, I am here to share the bad news that I do not have simple check list for you to follow. However, I have put together some thoughts that should hopefully help you write better data analyses after reading the thoughts below.
Audience
A primary goal of building a data analysis is to extract knowledge and insights from examining data (J. W. Tukey 1962; W. Tukey and Wilk 1966). This knowledge is often encoded into machine learning algorithms or related data products to facilitate use by large numbers of users. Yet, the discussion of how to build a data analysis that is trusted by its users often proceeds without explicit reference to an audience or consumer for whom the data analysis is being developed.
Indeed, there is much for a data analyst to consider on their own with respect to statistical techniques, visualization methods, data processing approaches, and computational algorithms that do not involve the needs or requirements of an audience member in particular. However, a critical goal for many data analyses is to be useful or persuasive to another person (Kimball 1957). The audience could range from simply the person doing the analysis to a much larger external group.
It is worth to think about how to build data analyses that are trustworthy and for others to have trust in the work that they produce. The extent to which results from data analyses are used for key policy decisions enhances the need for trust between analyst and audience. Broderick et al. (2023) note that complex data analyses with long data pipelines present numerous opportunities for trust to break down. In particular, they note that trust can break down if the evidence generated by an analysis is not useful for decision-making. An alternative framework is proposed by Yu and Barter (2024), who argue that trustworthiness in the data science life cycle can be achieved through building analyses that have predictability, computability, and stability. A principle that ties both frameworks together is the characterization of the development of trust as primarily being in the hands of the analyst, via decisions made about study design, data collection, model choice, and other aspects of the data science process.
One approach to addressing the challenge of building a useful analysis is to consider the audience as part of the design of the analysis itself. Hence, it can be a valuable exercise to think about who the audience is that you (as the data analyst) have in mind when you build the data analysis. Some examples of who a data analysis write up is written for could include:
- A primary non-technical audience: This could be a collaborator or client. They might be interested a higher-level, non-technical pass of the write up (primarily focused on the introduction, main figures, and conclusions to find out what you learned). They may or may not read the technical details to see if there is something that stands out or unexpected. Here you want to think about building your data analysis with their expertise in mind where you prioritize the type of information that they found most relevant in a data analysis (for example, maybe they need to see a specific type of plot) and then move more of the technical details to the appendix.
- A secondary non-technical audience: This is likely a non-technical manager, an executive, or someone who is crafting new policies based the results of your data analysis. These individuals will skim most of the components of your analysis and will be focused on the “headlines” of your work. You want to summarize your key findings clearly and succinctly.
- A primary or secondary technical audience: This could be a technical manager or an advisor. Here, this individual will carefully read all components, including the main technical details in your data analysis. They may challenge you to justify your data analytic choices and push you to improve your technical writing, your code, and data visualizations. Here, you want to make sure the details of your report are easily readable, succinct (not a list of everything you did over the whole analysis), and compelling to justify the results you summarize in the conclusions.
For more information on the audience of a data analysis, I encourage you to read Quantifying the Alignment of a Data Analysis Between Analyst and Audience (McGowan, Peng, and Hicks 2025).
Structure
Now that you have an audience in mind, the next question you might have is how to structure a data analysis. This is typically different from other professional writing you have done (e.g. research article for a peer-reviewed journal) because the structure of analyses can vary so much depending on the audience. In traditional research articles, there is typically less flexibility. The other thing I should add is that this is not a skill people are born with it. It takes practice to hone this skill of writing good data analyses and the more you do it, the easier it will become.
Generally speaking though, the structure of data analysis will include:
- An introduction and/or motivation of the problem being addressed
- What is the question being asked that you will then investigate with data?
- The body (may or may not include many of the technical details)
- Key conclusions and/or discussion of limitations of the data analysis
- Appendix (where often more of the technical detail goes if writing for a non-technical audience)
The ``body” is often quite different depending on who the audience is. You want to describe the computational approach(es) you took to investigate the question asked with the data available. Ideally, you are concise, but also it has to be informative.
For example, instead of saying “I used regression”, you could say “I fit a linear regression model to the price of diamonds using the size and clarity of the diamond as predictors”. You want to justify why these are relevant features and the body will often contain many plots or tables. This choice is up to you, but if you think a plot would be helpful to include to explain to the audience an important detail, then include it. You don’t want to include dozens or hundreds of plots without explanation or rationale why you included these plots.
Introduction and motivation
Some things to consider are:
- Who is your intended audience for the data analysis?
- What is the question you are trying to answer?
- Why is this question important or interesting? If appropriate, mention prior work or background that motivated this analysis
- What data are you using? Identify the dataset(s), their source, and key features. This should be concise as the data can be described in greater detail elsewhere in the write up.
- What is the expected contribution or insight? What does this analysis add or clarify?
Body
Some things to consider are:
- You want to clearly label figures, give them informative captions, and refer to them in the text in the order they in the paper (e.g. Figure 1, 2, 3, etc).
- Showing the code could be fine for certain audiences, but it could also be useful to just show the output of the code for other audiences as evidence for the results you summarize in the conclusions.
Conclusion and discussion
Some things to consider are:
- What did you learn from the analysis?
- What are the limiations of the data used or the data analysis?
- Were you able to answer the original question asked or a different question?
Some Dos and Don’ts
- When in doubt, use shorter words and sentences.
- A common pitfall in report writing is to recount your thought process step by step — for example: “First I did this, but it didn’t work. Then I tried something else and found A, B, and C. I wasn’t sure what to make of B, but C seemed interesting, so I followed up with D and E.” Avoid this approach. While your attention to detail is commendable, this style comes across as unpolished and unfocused.