Chapter 7 Collaborate on analysis development

Once your team has started meeting regularly, established norms, and built some common analysis templates and/or style guides, you’ll be ready to take the next step in collaboration: actually working on code as a team. This is much easier if you are all working in the same programming language (or languages).

7.1 Code reviews

The first level of coding collaboration your team should try is a code review - a process by which one developer shares their code with other developers to get constructive feedback. In the software development world, code reviews help teams minimize bugs and critical vulnerabilities. Analysis developers can realize similar benefits by conducing code reviews with their teams.

Every project will have buggy code at some point. Many times, the lead developer will be able to catch those bugs as they are coding their analysis, but sometimes an error will slip through their personal review process. If you’re staring at the same lines of code for days, you may miss something that could be obvious to someone looking at the same code with fresh eyes. Code reviews are incredibly helpful to spot and correct those kinds of mistakes.

A code review is also a helpful process to help an analysis developer check their work for methodological blind spots. Walking through your code with your teammates can help ensure you’re all on the same page with the particular choices you’ve made regarding cleaning, transforming, modeling, and visualizing your data. If there are differing opinions on particular choices, a code review is a great forum to resolve those differences.

All of this sounds great - just share your code with your teammates and you’ll squash bugs and resolve any methodological vulnerabilities - but it can also go very poorly if you haven’t established a strong level of trust and open communication on your team. Sharing your code for the first time is a nerve-racking experience. Make sure that when your team tries a code review for the first time, you set some explicit norms for that conversation:

  • Decide at the beginning of the conversation how the presenter would like to take questions and comments. Wait until the end? Jump in anytime? Since it’s their code, let the presenter decide what’s more comfortable for them.
  • If you’re giving feedback, remember to focus your comments and questions on the process and not the person. If you spot an error, don’t lead with saying “You forgot to include variable x in your group_by() call on line 53.” Instead, explain the issue you’re seeing and why you might suggest a different approach. This approach will help all of your teammates to understand the issue and the recommended solution, turning a moment of potential embarrassment into a learning opportunity for the whole team.

Code reviews, when done well, can be a great way to improve code quality and further develop your team’s coding skills. Just make sure you’ve done enough to build your team culture first - this will help ensure that it’s a positive and productive experience.

7.2 Collaborative Coding

Direct collaboration on analysis project code is incredibly powerful. Instead of waiting for an analysis project to make it to a code review, collaborative coding makes the development and review of code an organically iterative process.

Collaborative coding can be performed in-person by pair programming:

“Pair programming is an agile software development technique in which two programmers work together at one workstation. One, the driver, writes code while the other, the observer or navigator, reviews each line of code as it is typed in. The two programmers switch roles frequently.

“While reviewing, the observer also considers the”strategic" direction of the work, coming up with ideas for improvements and likely future problems to address. This frees the driver to focus all of their attention on the “tactical” aspects of completing the current task, using the observer as a safety net and guide."

Pair programming is a useful process for analysis developers to consider. It essentially creates a live code review process as the code is being written. This may work well in some education agencies, but others may find it difficult, particularly if a team works in an open-plan office (noise barriers) or in separate buildings (geographic barriers). However, developers can overcome barriers generated by office design through the use of Git and GitHub.

Git version control software and the web service GitHub are amazing tools. The learning curve can be a little steep since it involves some interaction with the command line, but there are resources that make it much easier to adopt these tools. In particular, Jenny Bryan’s Happy Git and GitHub for the UseR is a great introduction to these tools for R users and how to use them within RStudio.

Once your team has some working knowledge of how to use Git and GitHub, it’s pretty easy to start collaborating on analysis projects. If one analysis developer starts a GitHub “repo” for a project, another developer can “fork” their work, make changes and/or additions, then submit a “pull request” to merge their work into the original repo.

The beauty of collaborating on code with GitHub is that it allows your team to contribute to projects asynchronously. This enables analysis developers to work together even if their locations or schedules make pair programming or in-person code reviews difficult. It also makes it easy for a team to have a shared collection of commonly-used code, which can further contribute to the standardization, reliability, and speed of analysis development.