Introduction to Open Science: Why data versioning and data care practices are key for science and social science.

Introduction to Open Science: Why data versioning and data care practices are key for science and social science.

A significant shift in how researchers approach their data is needed if transparent and reproducible research practices are to be broadly advanced. Carly Strasserhas put together a useful guide to embracing open science, pitched largely at graduate students. But the tips shared will be of interest far beyond the completion of a PhD. If time is spent up front thinking about file organization, sample naming schemes, backup plans, and quality control measures, many hours of heartache can be averted.

I’m guessing that first year graduate students are knee-deep in courses, barely considering their potential thesis project. But for those that can multi-task, I have compiled this list of 10 things that you should undertake in your first year as a grad student. These aren’t just any 10 things… they are 10 steps you can take to make sure you contribute to a culture shift towards open science. Some a big steps, and others are small, but they will all get you (and the rest of your field) one step closer to reproducible, transparent research.

1. Learn to code in some language. Any language.

Here’s the deal: it’s easier to use black-box applications to run your analyses than to create scripts. Everyone knows this. You put in some numbers and out pop your results; you’re ready to write up your paper and get that H-index headed upwards. But this approach will not cut the mustard for much longer in the research world. Researchers need to know about how to code. Growing amounts and diversity of data, more interdisciplinary collaborators, and increasing complexity of analyses mean that no longer can black-box models, software, and applications be used in research. The truth is, if you want your research to be reproducible and transparent, you must code. In a 2013 article “The Big Data Brain Drain: Why Science is in Trouble“, Jake Vanderplas argues that

I learned MATLAB in graduate school, and experimented with R during a postdoc. I wish I’d delved into this world earlier, and had more skills and knowledge about best practices for scientific software. Basically, I wish I had attended a Software Carpentry bootcamp.

The growing number of Software Carpentry (SWC) bootcamps are more evidence that researchers are increasingly aware of the importance of coding and reproducibility. These bootcamps teach researchers the basics of coding, version control, and similar topics, with the potential for customizing the course’s content to the primary discipline of the audience. I’m a big fan of SWC – read more in my blog post on the organization. Check out SWC founder Greg Wilson’s article on some insights from his years in teaching bootcamps: Software Carpentry: Lessons Learned.

2. Stop using Excel. Or at least stop ONLY using Excel.

Most seasoned researchers know that Microsoft Excel can be potentially problematic for data management: there are loads of ways to manipulate, edit, reorder, and change your data without really knowing exactly what you did. In nerd terms, the trail of dataset changes is known as provenance; generally Excel is terrible at documenting provenance. I wrote about this a few years ago on the blog, and we mentioned a few of the more egregious ways people abuse Excel in ourF1000Research publication on the DataUp tool. More recently guest blogger Kara Woo wrote a great post about struggles with dates in Excel.

Of course, everyone uses Excel. In our surveys for the DataUp project, about 88% of the researchers we interviewed used Excel at some point in their research. And we can’t expect folks to stop using it: it’s a great tool! It should, however, be used carefully. For instance, don’t manipulate the sole copy of your raw data in Excel; keep your raw data raw. Use Excel to explore your data, but use other tools to clean and analyze it, such as R, Python, or MATLAB (see #1 above on learning to code). For more help with spreadsheets, see our list of resources and tools: UC3 Spreadsheet Help.

3.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

“I want to publish some open data. What do I do?”

13 Nov, 2016

So, open data is de rigueur at the moment and has been picking up pace over the last few years. …

Read more

Microsoft Open Data Project adopts new data use agreement for datasets

10 Dec, 2019

Last summer we announced Microsoft Research Open Data—an Azure-based repository-as-a-service for sharing datasets—to encourage the reproducibility of research and make …

Read more

The rise of autonomous systems will change the world

19 May, 2018

Harald Sack is Professor for Information Services Engineering at two of the most renowned research institutions in Europe:FIZ Karlsruhe and …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.