2021 Fall FDD
FDD Fall 2021 planning team: Gavin Fay (UMass Dartmouth), Andy Jones (NEFSC)
- Cohort Webpage
- Blog Post
- Fay Lab Manual Project of one of the teams. Widely used now as an example of a ‘Code of Conduct’ and on-boarding document for a team.
In September-October 2021, Openscapes led a 2-month Champions Cohort with Fisheries Dependent Data (FDD) Users, with over 30 fisheries scientists across academia and NOAA. These scientists were interested in exploring new approaches to working with FDD, which represents a complex mix of data and information collected to facilitate managing the region’s living marine resources. In the US Northeast, data flow from individual businesses and/or scientific samplers to the region’s scientific and management organizations. This web of data and information can be difficult to access as much of the content is confidential in its raw form. Further, many of the codes and systems used to store these data can be poorly documented, and even routine analyses are not commonly shared among data users. The FDD Openscapes Cohort was interested in learning ways to explore the potential to leverage FDD in new and innovative ways. It was part of ongoing efforts to provide access, documentation, and cultivate a community of practice that focuses on using these data and associated resources to their full potential.
What did the participants achieve?
from the Openscapes blog post
This was an exciting cohort because participants were focused on a common topic: research using FDD data. We had both data users (largely academic) and data providers (NOAA Northeast Fisheries Science Center) in the Cohort, and they were able to share common issues and approaches that could benefit others in the group. Many topics/themes that resonated with the 2021 FDD Cohort overlapped and reinforced each other. Here are a few examples.
Sharing data isn’t the only way to practice open science. Many aspects of FDD can be confidential, meaning that it cannot be shared publicly or has special considerations for sharing. We discussed how there are ways to be open and collaborative even if you can’t share the data itself. For example, sharing documentation and metadata is a way to be open and collaborative, even if you cannot share the data. Making knitted (RMarkdown) documents that show data summaries even when raw data can’t be accessed is another strategy. Another approach is to provide a “toy” or simulated dataset with all the right fields so that code can be developed, then shared with a partner who has access to the confidential data and can run the code on their own. Further, location data can be jittered, vessel IDs can be anonymized, and data can be collapsed or summarized in a way to protect the confidential aspects of the dataset before sharing it broadly. We also discussed different definitions and examples of confidential data, and how folks treat it differently within an agency, and outwardly for users. Do we all agree on what confidential data actually is? Does everyone know how to handle confidential data properly?
Psychological safety. When we discussed the importance of psychological safety, participants had great ideas for how to help foster more trust and safety in teams. For example, instead of “do you have any questions?” asking “what questions do you have?”, and creating spaces where individuals can be vulnerable to role model and encourage group vulnerability and trust. The cohort emphasized and reiterated that building trust and vulnerability takes time; it is built/earned through little moments and actions, not given outright - and it’s important for those that hold power (PIs, supervisors, managers, etc.) to lead by example and make space for those little moments within their teams. Others noted how important psychological safety was in this area where often data sets are not well documented and communication is essential to the proper use of the data.
Tooling helped teams collaborate - but Github and Slack are not the only options! Utilizing tools like Slack and GitHub can help some teams streamline and advance their communication and collaboration workflows, but that doesn’t mean those specific tools will work for everyone. Some individuals and teams in the FDD Cohort were already avid R/GitHub users, others were interested in using those tools more (GitHub Projects, and Issues, anyone?), and some individuals were not as keen on embedding a whole new software like GitHub or Slack into their workflows. Regardless of any individual’s interest or current abilities with a certain tool, everyone was able to use the Openscapes mindset to think about tools and workflows they currently use, explore how they are serving their teams (or not), and brainstorm ways to improve their data and communication workflows independent of any specific tool. Some teams in the FDD cohort regularly work with partners that may not have access to such tools, or even have access to the Internet. Teams began to discuss what to do in those situations and used their Seaside Chats to brainstorm and find solutions that work for them.
Code of Conduct for Data? In our final session one group shared that with their Pathway they had thought a lot about codes of conduct, and how they could be used to be more clear about how to work with data, data access, and acknowledging contributors across multiple institutions. Other groups agreed, thinking that this could be another form of metadata that would be very helpful to refer to before, during, and after the project. It could also help streamline onboarding of team members as they enter projects, and offboarding team members before they move on. The question of “Who owns the data?”, from principles of anti-colonial science, was an important framing, and folks recommended viewing the recording of Dr. Max Liboiron’s recent keynote about research, communication, and land relations.