Find us on GitHub

Center for Disease Control

Jan 19-20, 2016

9:00 am - 4:30 pm

Instructors: Chris Fonnesbeck, Emily Dolson

Helpers: TBD

General Information

Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at CDC researchers who have already taken a half-day Git course and are familiar with basic Unix commands. You don't need to have any previous knowledge of Python.

Where: . Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop, from which they will run a virtual machine. They are also required to abide by Software Carpentry's Code of Conduct.

Contact: Please mail zno6@cdc.gov for more information.


Schedule

Day 1

09:00 Automating tasks with the Unix shell
10:30 Coffee
10:45 Intro to Python
12:00 Lunch break
13:00 Advanced Git + best practices for writing scientific software
14:30 Coffee
14:45 More Python basics
16:00 Wrap-up

Day 2

09:00 Scientific Programming in Python
10:30 Coffee
10:45 Intro to NumPy and SciPy
12:00 Lunch break
13:00 Data Wrangling with Pandas
14:30 Coffee
14:45 Intro to Python Data Visualization
16:00 Wrap-up

Etherpad: http://pad.software-carpentry.org/2016-01-19-cdc.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

The Unix Shell

  • Pipes and redirection
  • Looping over files
  • Creating and running shell scripts
  • Finding things
  • Reference...

Programming in Python

  • Using libraries
  • Working with arrays
  • Creating and using functions
  • Loops and conditionals
  • Defensive programming
  • Using Python from the command line
  • Reference...

Best practices for collaboration with Git

  • Automatic testing (continuous integration)
  • Norms and best practices for large scale collaboration
  • Licenses
  • How to actually incorporate git into your workflow (+ practice doing so!)
  • Reference...

Scientific Programming in Python

  • Basic arrays, dtypes and numerical operations
  • Indexing, slicing, reshaping and broadcasting
  • Random number generation and simulation
  • Fitting simple statistical models to data
  • Reference...

Data Wrangling with Pandas

  • Importing data
  • Series and DataFrame objects
  • Indexing, data selection and subsetting
  • Hierarchical indexing
  • Reading and writing files
  • Sorting and ranking
  • Missing data
  • Data summarization
  • Date/time types
  • Merging, joining, reshaping DataFrame objects
  • Data transformation
  • Data aggregation and GroupBy operations

Data Visualization

  • Plotting in Pandas vs Matplotlib
  • Plot layout
  • Creation of basic plot types
  • Grouped plots and trellis plots
  • Visualization best practices