REDCAR
  • Starting page
  • Workshop 1 Reproducible
    • Getting started with Anaconda
    • Getting started with Git
    • Setting up Git extension for JupyterLab
    • Setting up Jupyter Notebooks in VS Code
  • Workshop 2 Understandable
    • Workflows
    • Better code
  • Workshop 3 Shared
    • Sharing for Accessibility
    • Sharing for Collaboration
Powered by GitBook
On this page
  • Introduction
  • Sharing for Accessibility
  • Sharing for Collaboration
  • A short note on another options
  • Agenda

Was this helpful?

Workshop 3 Shared

PreviousBetter codeNextSharing for Accessibility

Last updated 5 years ago

Was this helpful?

Introduction

The goals of this workshop are to:

  • cover some tools and best practices for sharing your work that can make it easier for others to access (and also reproduce!),

  • examine a few ways you can collectively work on a project together, and

  • practice these concepts hands-on.

We will mostly cover (a specific implementation of the JupyterHub project) and . We'll briefly also mention some cloud storage methods you can use, but won't cover that in-depth.

Sharing for Accessibility

So your code is now more reproducible and understandable. Great. Let's tell the world! Hello world!

However, that person needs to also have a preliminary understanding how environments and package management work. They also need access to a laptop where they can install software freely. What if someone doesn't understand what an environment is? What if they have a company computer they can't install Python and Jupyter on or want to see your work from a phone or tablet (most internet users are mobile these days)?

MyBinder.org is a free cloud service that gives anyone browser-based access to a Jupyter or RStudio interface that can run your code on a remote server. In other words, as long as your repository is configured neatly, anyone with internet access can run your code without needing to configure anything!

Sharing for Collaboration

Another option is to use something like Google Colab, which is similar to Binder, but doesn't necessarily need a Git repository behind it; you can simply open Colab, upload files, and save it on the (Google) cloud. You can share a link with others and they will be able to see exactly what you do, but not in real-time. Colab is similar to Google Docs except it does not support live coding, where two people can edit and run a notebook simultaneously.

A short note on another options

Always keep in mind the potential drawbacks of such services. You create and work with the scripts and notebooks online. As a result, if you've lost the internet connection, you can also lose some parts for your work. Without a doubt, there is always a certain backup mechanism. However, stuff happens, you know...

Finally, privacy. Make sure that you've read the terms and conditions. You don't want to lose the rights of your code or data, right? If the terms and conditions are too long and written in a tricky bureaucratic way, ask fellow programmers on Stack Overflow.

Agenda

When?

What?

10:15 – 10:30

Getting ready up with BBB

10:30 – 10:45

Recap of workshops 1 and 2

10:45 – 11:00

REDCAR project and workshop 3 introduction

11:00 – 11:30

Sharing for Accessibility

11:30 – 11:45

Break

11:45 – 12:15

Sharing for Collaboration

12:15 – 13:00

Debugging to get it working for you + discussion (?) + Closing

We already showed you in how to put your Git-tracked repository of work onto GitHub, a cloud Git repository. We talked about how we can manage that repository to make it easy for someone to "clone" or download and run on their own computer.

The other main reason to share our scientific work is for others to collaborate with use. Again, Git via GitHub can handle a lot of that, but we haven't covered multi-user Git (branching, merging, pull requests, and resolving merge conflicts). Unfortunately, we won't here either, but one of our colleagues shared a nice interactive game that can teach you it: .

There are options for real-time Notebook editing, but we haven't tested them yet. CoCalc is a paid option and is in a closed beta at the time of writing.

Of course, as always, there are other services that provide similar solutions

(free and paid tiers)

(Machine Learning-oriented – it's expensive but powerful)

For R you can use . The service requires you to register and right after you'll get access to your workspace. There, in similar to RStudio fashion, you can create a project, make scripts and use the whole potential beloved RStudio.

Second, computational power and storage capacity. Of course, without paying this services extra fee, you won't reach the maximum . Though, that poses the question: do you even need this maximum? If not, you're safe, keep up good work. Otherwise, think about the heavy part do it locally and share only the resulting notebooks. For example, I want to perform clustering on a huge data set out of 1 billion samples. So, I'm doing it on my , saving the obtained clusters and loading them into a notebook. No need to use cloud services for that .

🚀
https://learngitbranching.js.org
Deepnote
Kaggle Notebooks
Microsoft Azure Notebooks
Amazon Sagemaker
IBM Notebooks
RStudio Cloud
💯
🎆
super powerful laptop
MyBinder
Google Colab
workshop 1