Memory usage of dataframe columns • inspectdf

Illustrative data: `starwars`

The examples below make use of the starwars and storms data from the dplyr package

# some example data
data(starwars, package = "dplyr")
data(storms, package = "dplyr")

For illustrating comparisons of dataframes, use the starwars data and produce two new dataframes star_1 and star_2 that randomly sample the rows of the original and drop a couple of columns.

library(dplyr)
star_1 <- starwars %>% sample_n(50)
star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)

Illustrative data: `starwars`

The examples below make use of the starwars and storms data from the dplyr package

# some example data
data(starwars, package = "dplyr")
data(storms, package = "dplyr")

For illustrating comparisons of dataframes, use the starwars data and produce two new dataframes star_1 and star_2 that randomly sample the rows of the original and drop a couple of columns.

library(dplyr)
star_1 <- starwars %>% sample_n(50)
star_2 <- starwars %>% sample_n(50) %>% select(-1, -2)

`inspect_mem()` for a single dataframe

To explore the memory usage of the columns in a data frame, use inspect_mem(). The command returns a tibble containing the size of each column in the dataframe.

library(inspectdf)
inspect_mem(starwars)

## # A tibble: 14 × 4
##    col_name   bytes size        pcnt
##    <chr>      <int> <chr>      <dbl>
##  1 films      20008 19.54 Kb  35.9  
##  2 starships   7448 7.27 Kb   13.4  
##  3 name        6280 6.13 Kb   11.3  
##  4 vehicles    5944 5.8 Kb    10.7  
##  5 homeworld   3608 3.52 Kb    6.48 
##  6 species     2952 2.88 Kb    5.30 
##  7 skin_color  2656 2.59 Kb    4.77 
##  8 eye_color   1608 1.57 Kb    2.89 
##  9 hair_color  1440 1.41 Kb    2.59 
## 10 sex          976 976 bytes  1.75 
## 11 gender       872 872 bytes  1.57 
## 12 mass         744 744 bytes  1.34 
## 13 birth_year   744 744 bytes  1.34 
## 14 height       400 400 bytes  0.718

A barplot can be produced by passing the result to show_plot():

inspect_mem(starwars) %>% show_plot()

`inspect_mem()` for two dataframes

When a second dataframe is provided, inspect_mem() will create a dataframe comparing the size of each column for both input dataframes. The summaries for the first and second dataframes are show in columns with names appended with _1 and _2, respectively.

inspect_mem(star_1, star_2)

## # A tibble: 14 × 5
##    col_name   size_1    size_2    pcnt_1 pcnt_2
##    <chr>      <chr>     <chr>      <dbl>  <dbl>
##  1 films      10.72 Kb  11.58 Kb  33.0    39.2 
##  2 starships  4.45 Kb   4.77 Kb   13.7    16.2 
##  3 name       3.55 Kb   NA        10.9    NA   
##  4 vehicles   3.23 Kb   3.45 Kb    9.93   11.7 
##  5 homeworld  2.49 Kb   2.12 Kb    7.67    7.20
##  6 species    1.87 Kb   1.77 Kb    5.74    6.01
##  7 skin_color 1.85 Kb   1.55 Kb    5.70    5.24
##  8 eye_color  1.17 Kb   1.23 Kb    3.60    4.15
##  9 hair_color 856 bytes 1.01 Kb    2.57    3.41
## 10 sex        680 bytes 616 bytes  2.04    2.04
## 11 gender     576 bytes 576 bytes  1.73    1.91
## 12 mass       448 bytes 448 bytes  1.35    1.48
## 13 birth_year 448 bytes 448 bytes  1.35    1.48
## 14 height     248 bytes NA         0.745  NA

inspect_mem(star_1, star_2) %>% show_plot()

Missingness and counting NAs

Illustrative data: starwars

Illustrative data: starwars

inspect_mem() for a single dataframe

inspect_mem() for two dataframes

Illustrative data: `starwars`

Illustrative data: `starwars`

`inspect_mem()` for a single dataframe

`inspect_mem()` for two dataframes