maldipickr

library(maldipickr)

Quickstart

The {maldipickr} package helps microbiologists reduce duplicate/clonal bacteria from their cultures and eventually exclude previously selected bacteria. {maldipickr} achieve this feat by grouping together data from MALDI Biotyper and helps choose representative bacteria from each group using user-relevant metadata – a process known as cherry-picking.

{maldipickr} cherry-picks bacterial isolates with MALDI Biotyper:

Using taxonomic identification report

First make sure {maldipickr} is installed and loaded, alternatively follow the instructions to install the package.

Cherry-picking four isolates based on their taxonomic identification by the MALDI Biotyper is done in a few steps with {maldipickr}.

Get example data

We import an example Biotyper CSV report and glimpse at the table.

report_tbl <- read_biotyper_report(
  system.file("biotyper_unknown.csv", package = "maldipickr")
)
report_tbl %>%
  dplyr::select(name, bruker_species, bruker_log) %>% knitr::kable()
name bruker_species bruker_log
unknown_isolate_1 not reliable identification 1.33
unknown_isolate_2 not reliable identification 1.40
unknown_isolate_3 Faecalibacterium prausnitzii 1.96
unknown_isolate_4 Faecalibacterium prausnitzii 2.07

Delineate clusters and cherry-pick

Delineate clusters from the identifications after filtering the reliable ones and cherry-pick one representative spectra.

Unreliable identifications based on the log-score are replaced by “not reliable identification”, but stay tuned as they do not represent the same isolates!

report_tbl <- report_tbl %>%
  dplyr::mutate(
      bruker_species = dplyr::if_else(bruker_log >= 2, bruker_species,
                                      "not reliable identification")
  )
knitr::kable(report_tbl)
name sample_name hit_rank bruker_quality bruker_species bruker_taxid bruker_hash bruker_log
unknown_isolate_1 NA 1 - not reliable identification NA 3e920566-2734-43dd-85d0-66cf23a2d6ef 1.33
unknown_isolate_2 NA 1 - not reliable identification NA 88a85875-eeb5-4858-966e-98a077325dc3 1.40
unknown_isolate_3 NA 1 + not reliable identification 137408536 2d266f20-5428-428d-96ec-ddd40200794b 1.96
unknown_isolate_4 NA 1 +++ Faecalibacterium prausnitzii 137408536 2d266f20-5428-428d-96ec-ddd40200794b 2.07

The chosen ones are indicated by to_pick column.

report_tbl %>%
  delineate_with_identification() %>%
  pick_spectra(report_tbl, criteria_column = "bruker_log") %>%
  dplyr::relocate(name, to_pick, bruker_species) %>% 
  knitr::kable()
#> Generating clusters from single report
name to_pick bruker_species membership cluster_size sample_name hit_rank bruker_quality bruker_taxid bruker_hash bruker_log
unknown_isolate_1 TRUE not reliable identification 2 1 NA 1 - NA 3e920566-2734-43dd-85d0-66cf23a2d6ef 1.33
unknown_isolate_2 TRUE not reliable identification 3 1 NA 1 - NA 88a85875-eeb5-4858-966e-98a077325dc3 1.40
unknown_isolate_3 TRUE not reliable identification 4 1 NA 1 + 137408536 2d266f20-5428-428d-96ec-ddd40200794b 1.96
unknown_isolate_4 TRUE Faecalibacterium prausnitzii 1 1 NA 1 +++ 137408536 2d266f20-5428-428d-96ec-ddd40200794b 2.07

Using spectra data

In parallel to taxonomic identification reports, {maldipickr} process spectra data. Make sure {maldipickr} is installed and loaded, alternatively follow the instructions to install the package.

Cherry-picking six isolates from three species based on their spectra data obtained from the MALDI Biotyper is done in a few steps with {maldipickr}.

Get example data

We set up the directory location of our example spectra data, but adjust for your requirements. We import and process the spectra which gives us a named list of three objects: spectra, peaks and metadata (more details in Value section of process_spectra()).

spectra_dir <- system.file("toy-species-spectra", package = "maldipickr")

processed <- spectra_dir %>%
  import_biotyper_spectra() %>%
  process_spectra()

Delineate clusters and cherry-pick

Delineate spectra clusters using Cosine similarity and cherry-pick one representative spectra. The chosen ones are indicated by to_pick column.

processed %>%
  list() %>%
  merge_processed_spectra() %>%
  coop::tcosine() %>%
  delineate_with_similarity(threshold = 0.92) %>%
  set_reference_spectra(processed$metadata) %>%
  pick_spectra() %>%
  dplyr::relocate(name, to_pick) %>% 
  knitr::kable()
name to_pick membership cluster_size SNR peaks is_reference
species1_G2 FALSE 1 4 5.089590 21 FALSE
species2_E11 FALSE 2 2 5.543735 22 FALSE
species2_E12 TRUE 2 2 5.633540 23 TRUE
species3_F7 FALSE 1 4 4.889949 26 FALSE
species3_F8 TRUE 1 4 5.558884 25 TRUE
species3_F9 FALSE 1 4 5.398429 25 FALSE

This provides only a brief overview of the features of {maldipickr}, browse the other vignettes to learn more about additional features.

Session information

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] maldipickr_1.3.2 rmarkdown_2.29  
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.9           dplyr_1.1.4              compiler_4.4.2          
#>  [4] MALDIquant_1.22.3        tidyselect_1.2.1         parallel_4.4.2          
#>  [7] tidyr_1.3.1              jquerylib_0.1.4          yaml_2.3.10             
#> [10] fastmap_1.2.0            R6_2.5.1                 generics_0.1.3          
#> [13] knitr_1.49               tibble_3.2.1             maketools_1.3.1         
#> [16] readBrukerFlexData_1.9.3 bslib_0.8.0              pillar_1.9.0            
#> [19] rlang_1.1.4              utf8_1.2.4               cachem_1.1.0            
#> [22] xfun_0.49                sass_0.4.9               sys_3.4.3               
#> [25] cli_3.6.3                withr_3.0.2              magrittr_2.0.3          
#> [28] digest_0.6.37            lifecycle_1.0.4          vctrs_0.6.5             
#> [31] evaluate_1.0.1           glue_1.8.0               buildtools_1.0.0        
#> [34] coop_0.6-3               fansi_1.0.6              purrr_1.0.2             
#> [37] tools_4.4.2              pkgconfig_2.0.3          htmltools_0.5.8.1