REDCAR
  • Starting page
  • Workshop 1 Reproducible
    • Getting started with Anaconda
    • Getting started with Git
    • Setting up Git extension for JupyterLab
    • Setting up Jupyter Notebooks in VS Code
  • Workshop 2 Understandable
    • Workflows
    • Better code
  • Workshop 3 Shared
    • Sharing for Accessibility
    • Sharing for Collaboration
Powered by GitBook
On this page
  • Introduction
  • Workflows
  • Better code
  • Case study
  • References
  • Agenda

Was this helpful?

Workshop 2 Understandable

PreviousSetting up Jupyter Notebooks in VS CodeNextWorkflows

Last updated 5 years ago

Was this helpful?

Introduction

The goals of the second workshop are:

  • show you how to organize your study in a way that others (fellow students, researchers or programmers) can easily understand what an amazing job you did;

  • remind you how efficiently name variables, program functions and why code formatting is important (PEP-8, Black, etc.);

  • practice it on a case study.

Workflows

Let's talk about workflows first. What is a workflow in simple terms? Well, it's a flow of work, the steps that you need to undertake to solve a problem or a task in accordance with a domain (i.e. data science, engineering). A workflow can be pretty extensive in tools that should be used to support each of the actions, or a more high-level (so-called step-wise approach). Here is a formal definition by Business Process Management Center of Excellence Glossary (Wikipedia contributors, 2020):

"Workflow is an orchestrated and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information."

Even though different domains require domain-specific knowledge to define a workflow and follow it, there are certain similarities across problem-oriented workflows. You start with a problem, do something in the middle and propose a solution or an insight at the end. Here is a :

Such a high-level similarity can be helpful in establishing a transdisciplinary collaboration and promotes joint understanding of the work process.

"Successful problem solving requires finding the right solution to the right problem. We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem."

Finally, students should remember that the workflow ≠ thesis structure. While it seems attractive to have a step of predefined steps, modeling process does not equal to scientific research process. Modeling serve to help answer research questions, propose solutions or evaluate policies. They are instruments of analysis.

Better code

Here we have two things to discuss: how to organize your project (the files) and how you write your code.

It is important how name your variables and functions and comment the code. Consider a variable blah4. Yes, that's a real variable! It is a data frame with a bus schedule inside. The name was given by an junior employee of a mid-size data science company working in sustainable transportation. And you know what, it is not a problem! First make it work, then make it right! The problems begin when he will placed on another project. After a week or a month is back and how easily he will figure what blah4 stands for? So instead of exercising your memory, try to name your variables in human-readable format: bus_schedule. Easy-peasy, right?

  • we follow a certain workflow,

  • name our notebooks according to the step,

  • variable and function names are also fine,

Case study

Let us practice these and the tools from Workshop 1 Reproducible on a case study. We prepared 3 options for you:

  1. Exploring emergency calls in the Netherlands;

  2. Understanding dynamics of COVID-19 in the Netherlands;

  3. A project that you are currently working on.

We do not state a goal for each of these studies, you can formulate it yourself. But if you feel lost, here are a some options:

The data can be found here:

№

Original data sets

Processed sample data sets

1

2

  1. Formulate a problem or a hypothesis;

  2. Get the data;

  3. Visualize it;

  4. Apply a simple model;

  5. Report your findings;

  6. Upload the work to the GitHub.

References

  1. Hermans, L., & Cunningham, S. W. (2018). Actor and Strategy Models. Wiley Blackwell.

  2. Ciara Byrne (2017). Development Workflows for Data Scientists. O’Reilly Media, Inc.

  3. Richardson, G. P., & Pugh III, A. I. (1981). Introduction to system dynamics modeling with DYNAMO. Productivity Press Inc..

Agenda

When?

What?

10:15 - 10:30

Getting ready up with BBB

10:30 - 10:45

Recap of workshop 1

10:45 - 11:00

REDCAR project and workshop 2 introduction

11:00 - 11:30

Talking about workflows

11:30 - 11:45

Break

11:45 - 12:15

How to write better code?

12:15 - 13:00

Starting with a case study

Workflows are usually depicted with a diagram. Data science studies often depicted as (DAGs), whereas in reality process looks similar to the System Dynamics one (going back and forth from one stage to another).

Sounds obvious, of course, but it there are a couple of important implications to keep in mind. First, qualitative and quantitative workflows have a similar skeleton. Consider a step-wise approach proposed by and a workflow for data scientist by :

Second, it is important to remember that you are addressing a problem or a question (and not the method). Identification of the"right" problem is time consuming (as well as finding an appropriate method). Remember famous saying?

In case of a data science project it may seem trivial. to predict, to cluster, or to try to explain a certain phenomenon. But when you start working, you can realize that there is a different angle that seems more promising. Just remember, violating from the original problem or doing research in an exploratory fashion, one hand problem, can bring unexpected benefits, but on the other side, can bring an extra burden and shift your deadlines.

The project structure follows from the chosen workflow. For example, if you are doing a data science research and decided to follow the workflow from above you should have Jupyter Notebooks and scripts dedicated to each of the "steps." Simulation studies conducted "purely" with programming languages (see amazing and free by Allan Downey), follow the same logic (i.e. separate the simulation model from analysis of experiments). If you are using a certain simulation modeling software, for example, , you are forced to follow its internal logic.

Guess what is another way to reduce the effort that it is needed to understand the code? Exactly! Commenting it! However, commenting is also tricky. You do not want to overwhelm yourself, colleague or another researcher with a novel on how you did this and that. Here is a trick. If

then the amount of comments should be much less! And of course, communities are helping us thestandards (i.e. or ).

Predict the number of firefighter calls based on the socio-demographic and housing variables (try );

Cluster positive tests curves (see ) and find similarities in cities of the same cluster (i.e. ).

Warning: These are just exercises that we are proposing you to work on in the class. Without a doubt, such a simple analysis cannot capture the whole complexity of any these problems. So be aware and do not make quick conclusions.

Tips: Try Wayback Machine to get the past data. For example, you can complete RIVM data set on COVID-19 by municipality use the following .

We expect you to do a minimalist study:

So, select an exercise and roll-on! After your finish the exercise we will be happy to check it out and give some feedback. Just send us the GitHub repo and we will send you the feedback .

Wikipedia contributors. (2020, March 23). Workflow. In Wikipedia, The Free Encyclopedia. Retrieved March 23, 2020, from

🎆
🕵️
🐉
🤓
🦸
🚀
🤓
directed acyclic graphs
Hermans & Cunningham (2018)
Byrne (2017)
Russel Ackoff's
The methods allow you
Think Complexity
Simio
🥇
PEP-8
Vensim naming conventions
AutoML
tslearn
population size
🧙
query
https://en.wikipedia.org/w/index.php?title=Workflow&oldid=946935282
112 Nederland
SURFdrive public link 1
Kaart van 500 meter bij 500 meter met statistieken
Kerncijfers wijken en buurten 2019
Wijk- en buurtkaart 2019
Coronavirus kaart van Nederland per gemeente
SURFdrive public link 2
2020 coronavirus pandemic in the Netherlands
Kerncijfers wijken en buurten 2019
Bestuurlijke Grenzen Extract 2020 (Actueel)
👨‍🎓
👨‍🎓
great example
Famous underpants gnomes profit plan
Overview of the SD modeling approach according to Richardson & Pugh (1981)
One representation of the data science process
Step-wise approach for actor network scanning
The perfect code quality measurement doesn't exi-