Details
- Structure prediction requires constructing multiple sequence alignments of homologous sequences from large reference databases.
- Different tools use different versions of the same reference databases with different naming conventions. This leads to excessive duplication and data bloat.
- Reference data are frequently updated (at differing schedules) but local copies are likely updated infrequently.
- Up-to-date reference data can dramatically improve prediction quality.
This activity aims to create a stable release of up-to-date reference data (with DOI) to support reproducible structure prediction workflows.
Completed
- Catalog latest version of reference data.
- Harmonize data across different structure prediction models (AlphaFold2, AlphaFold3, Boltz, ColabFold, HelixFold3, RosettaFold-AA).
In Progress
- NCI Data Collection EOI.