Getting started with Anaconda

Virtual environment

Let's open Anaconda Prompt (or Command Prompt if you're Mac user).

The first step is to choose the environment name. Usually, it's = project name.

conda create --name redcar python=3.7.4

After you have created a new virtual environment, you need to activate it to use it. To do so, use the following command:

conda activate redcar

Now you can install all the packages that you will use in this project. Let's try to install pandas:

conda install pandas

You can always install a specific version of a package conda install pandas==0.25 (e.g., your project group member using not the latest one.) You can also install a set of packages at once. Just type them separated with space:

conda install matplotlib seaborn numpy

Let's take a look at what packages do we have:

conda list

To dump this list of packages into a single file use:

conda env export > environment.yml

Alternatively, if you want to build an can use the following command:

conda list --explicit > spec-file.txt

Note that this specification file will allow you to install on the same operation system (i.e. Windows).

Now let's get back to our base environment. For this simply type:

conda deactivate
conda env remove --name redcar
# Deafult option
conda env create --name redcar --file environment.yml

# Or if you used the second option
conda list --explicit > spec-file.txt

Alright! We're back on track. We have a new virtual environment with a base set of packages to continue our work!

Installing a kernel to a new environment

However, to make new environment to work inside JupyterLab (or Jupyter Notebook), we need to tell JupyterLab that the environment exists. We can do this with the following command:

python -m ipykernel install --user --name=redcar

Poof! You're done! Now you should see an extra icon on the right from default Python 3 called redcar.

After you've finished a project, you may want to delete associated virtual environment and the kernel from JupyterLab (read get rid of myenv icon like on the image below, or as in our case redcar).

To do so, let's open Anaconda Prompt or Command Prompt and type:

jupyter kernelspec list

The output of this command is a list of all available kernels. For example, if you have only two kernels installed redcar and default Python 3, the output should look like this:

Available kernels:
  redcar      /home/user/.local/share/jupyter/kernels/redcar
  python3    /usr/local/share/jupyter/kernels/python3

Alright, let's delete redcar:

jupyter kernelspec uninstall redcar

Alright! We've just rolled back to the original state of your JupyterLab.

Cleaning up your base environment

Most probably, you haven't knew that there should be a new environment for every single project. Instead your were staging all things that you wanted to try out to the base . And at another sunny weekend you said to yourself: "Well, it's time to make a cleaning!"

There is a different ways to keep track of was installed in your environments. As we know, magic command is conda list . The next step is to list all the packages that you don't use anymore and uninstall them with conda remove. Alternatively, you can type:

conda list --name base --rev

rev here stands for "revision" and the output of this command is a set of "snapshots" of your base environment. For example, on 01-01-2020 you hadrev 0 which is a very first version of your base environment when you've just installed Anaconda. The good news are: you can rollback to this revisions! The bad news, however, are that it is unstable! Practice showed that it's better to rollback to at least rev 1 . We can do it with:

conda install --rev 1

You see a command install over there? Yes, it means that we will install the packages that you had in rev 1. The obvious problem is that some of them (installed in let's say 2016) aren't available anymore. In this case, it's better to avoid playing with revisions. Instead try to install a fresh new version of Anaconda.

Cookiecutter Data Science

The project template that we will use was designed by DrivenData and called Cookicutter Data Science. The project website says:

"Cookiecutter Data Science is a logical, reasonably standardized, but flexible project structure for doing and sharing data science work."

The first step is to open Anaconda Prompt (or Command Prompt) and activate the virtual environment.

conda activate redcar
pip install cookiecutter

Nice! The next step is to point Cookiecutter to a specific project template:

cookiecutter https://github.com/drivendata/cookiecutter-data-science

You'll get a set of questions such as:

  • project name and repo name (usually they're =),

  • author name (surname and company if any),

  • short description of your project (a couple of line of code for README.md),

  • licence (read more about licenses here),

  • S3 bucket and AWS profile (for establishing a pipeline with Makefile).

Let's fill it up!

Alright! Great success! Now it's time to continue this work with Git and GitHub.

Last updated