Skip to content Skip to footer

Activity in progress: NCI Data Collection

Details

  • Structure prediction requires constructing multiple sequence alignments of homologous sequences from large reference databases.
  • Different tools use different versions of the same reference databases with different naming conventions. This leads to excessive duplication and data bloat.
  • Reference data are frequently updated (at differing schedules) but local copies are likely updated infrequently.
  • Up-to-date reference data can dramatically improve prediction quality.

This activity aims to create a stable release of up-to-date reference data (with DOI) to support reproducible structure prediction workflows.

Completed

  • Catalog latest version of reference data.
  • Harmonize data across different structure prediction models (AlphaFold2, AlphaFold3, Boltz, ColabFold, HelixFold3, RosettaFold-AA).

In Progress

  • NCI Data Collection EOI.
Contributors