File Organization

File Organization: Organization


Teaching: 30 min
Exercises: 10 min
  • What are the common file organization errors?

  • What are best practices for file organization?

  • Highlight common SNAFUs

A place for everything, everything in its place - Benjamin Franklin

plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-2

Data analysis workflow

plot of chunk unnamed-chunk-3

#### Face it…

#### Mighty weapon

### Organizing your data analysis workflow

Raw data $\rightarrow$ data

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-4

#### Data $\rightarrow$ results

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-5

#### Data $\rightarrow$ results

Pick a strategy, any strategy, just pick one and stick to it!

plot of chunk unnamed-chunk-6

### A real (and imperfect!) example

plot of chunk unnamed-chunk-7

#### Data

Ready to analyze data:

plot of chunk unnamed-chunk-8

Raw data:

plot of chunk unnamed-chunk-9

#### Analysis and figures

R scripts + the Markdown files from “Compile Notebook”:

plot of chunk unnamed-chunk-10

The figures created in those R scripts and linked in those Markdown files:

plot of chunk unnamed-chunk-11

#### Scripts Linear progression of R scripts, and Makefile to run the entire analysis:

plot of chunk unnamed-chunk-12

#### Results

Tab-delimited files with one row per gene of parameter estimates, test statistics, etc.:

plot of chunk unnamed-chunk-13

#### Expository files

Files to help collaborators understand the model we fit: some markdown docs, a Keynote presentation, Keynote slides exported as PNGs for viewability on GitHub:

plot of chunk unnamed-chunk-14

#### Caveats / problems with this example

#### Wins of this example


### Other tips

Tips: the from_joe directory

#### Tip: give yourself less rope

#### Tip: prose

#### Tip: life cycle of data

Here’s how most data analyses go down in reality: - You get raw data - You explore, describe and visualize it - You diagnose what this data needs to become useful - You fix, clean, marshal the data into ready-to-analyze form - You visualize it some more - You fit a model or whatever and write lots of numerical results to file - You make prettier tables and many figures based on the data & results accumulated by this point - Both the data file(s) and the code/scripts that acts on them reflect this progression

#### Prepare data $\rightarrow$ Do stats $\rightarrow$ Make tables & figs

The R scripts:


The figures left behind:


### Recap

File organization should reflect inputs vs outputs and the flow of information

drwxr-xr-x  20 jenny  staff        680 Apr 14 15:44 analysis
drwxr-xr-x   7 jenny  staff        238 Jun  3  2014 data
drwxr-xr-x  22 jenny  staff        748 Jun 23  2014 model-exposition
drwxr-xr-x   4 jenny  staff        136 Jun  3  2014 results

plot of chunk unnamed-chunk-15

Key Points