Corporate, Sustainability & Governance Datasets for Training LLMs and AI Applications

Our LLM Datasets

We collect and transform corporate disclosure documents from company websites.
Every document is human tagged and validated, guaranteeing the highest quality machine readable datasets ideal for:

  • training Large Language Models (LLMs);
  • creating NLP classifiers;
  • AI applications;
  • other machine learning (ML) algorithms.

Centralised library of reliable, tagged and machine-readable corporate financial and sustainability reporting data

6,000
companies

150,000
documents

116,000,000
machine readable
sentences