solutions
insights
investors
Contact Us

Corporate Reporting LLM Data

Corporate financial and sustainability reporting datasets. Reliable, tagged and machine-readable.

Centralised library of over 6,000 international companies, 150,000 documents and 116 million sentences.

How We Can Help You

Machine-Readable Data

LLM training data incorporating sustainability disclosures from European and US companies since 2019.

Streamlined Delivery

Access through multiple interfaces making acquisition and data science simple.

Time-Sensitive Updates

The latest company reports updated regularly across the dataset.

Trustworthy Sources

Transparent and complete data you can trust.

Tagged, pre-trained sentence-level data

Publicly available corporate documents are curated using an automated pipeline which seamlessly processes data collected. For transparency, the database holds the original document (pdf), and the extracted machine-readable text, split into sentences. Each sentence is tagged with metadata and attributed to company, region, country, industry, reporting year and report type, among others.

Natural Language Processing (NLP) is used to classify each sentence across 14 sustainability-related topics for specific pre-trained model usage. These BERT-based models are subject matter expert-trained on issues such as climate change and human rights.

Tagged, pre-trained sentence-level data

Guided set-up, simple access

Our dataset is provided using the AWS cloud infrastructure, with ease of deployment and user access front of mind (available in JSON, CSV or text). We also provide data via an API service enabling on-demand analytics on our disclosure data. The dataset is also available via our Research portal, which is designed for analytics, benchmarking, interrogation and export of the data.

Data can be curated according to client-specific requirements such as sector, region and topics based on our NLP classifiers or keywords.

Guided set-up, simple access

Most recent reports available

Regular review of company websites is an important part of ensuring we have all documents made public by a company. This allows the user to understand industry trends in relation to corporate financial and sustainability reporting and language.

Our scalable document collection pipeline also allows efficient gathering of data required by clients not already in the document library, i.e. document collection on demand.

Most recent reports available

Data you can rely on

For each document collected, we provide the source of the information at time of collection so our dataset of sentences can be trusted as authentic and appropriate for LLM use. While our process is largely automated, we take the time to human tag, review and validate the work done by our team. This ensures that you can trust what we deliver, and the quality meets your expectations.

Data you can rely on

Related Insights

Using AI to help financial regulators detect greenwashing.  Presentation with ImpactScope at World AI Cannes Festival 2024

8 April 2024

Using AI to help financial regulators detect greenwashing. Presentation with ImpactScope at World AI Cannes Festival 2024

Harnessing specialised large language models for corporate sustainability reporting

31 January 2024

Harnessing specialised large language models for corporate sustainability reporting

Transparency & Disclosure Index reveals stark differences in reporting patterns across UK's largest companies

20 November 2023

Transparency & Disclosure Index reveals stark differences in reporting patterns across UK's largest companies

Only 5% of FTSE100 have credible climate transitions plans according to EY: Insig AI's response

4 April 2023

Only 5% of FTSE100 have credible climate transitions plans according to EY: Insig AI's response

Generative AI: The game-changing access to advanced technology

9 March 2023

Generative AI: The game-changing access to advanced technology

Building a best practice ESG risk scoring system for private entities

20 January 2023

Building a best practice ESG risk scoring system for private entities

We use cookies to give you the best experience. Please let us know if you agree to all of these cookies.