VCM.fyi Data Pipeline

Data Pipeline

Daily automated collection, harmonization, and delivery of project and credit data from 6 carbon registries into a single unified database.

Registries

11,000+

Projects

500,000+

Credit Transactions

Daily

Pipeline Cadence

The VCM.fyi Data Pipeline is the foundation of the entire platform. Every day, raw data from six carbon registries is fetched, processed, harmonized into a common schema, and loaded into the production database. This pipeline ensures that the platform always reflects the latest registry state — new projects, fresh issuances, and recent retirements.

The Problem

Carbon credit data is fragmented across six major registries, each with different data formats, field names, and update schedules. A simple question like "How many credits has this project issued?" requires checking multiple registry websites, each with different interfaces and data structures. Analysts spend hours on manual data collection that could be automated.

How It Works

Raw Data Fetch

Automated workflows fetch the latest data from each registry every morning. Isometric data is retrieved via their API daily. Verra, Gold Standard, ACR, CAR, and Puro Earth data is fetched from raw exports and stored in our canonical data warehouse.

Processing and Harmonization

Raw registry data is processed into a harmonized schema. Field names are standardized (every registry calls things differently), project types are mapped to a common taxonomy, credit quantities are normalized, and duplicate records are resolved. The output is optimized columnar files for analytical queries.

Database Seeding

Processed data is loaded into the production database. This is a full refresh — the database reflects the complete current state of all registries, not just incremental changes. This approach ensures consistency and eliminates drift between the database and registry reality.

Archival

Daily snapshots are archived with 30-day retention, enabling historical comparisons and data recovery. Raw registry data is preserved permanently.

Data Sources

Verra (VCS) — projects, credits, transactions
Gold Standard — projects, credits, transactions
American Carbon Registry (ACR) — projects, credits, transactions
Climate Action Reserve (CAR) — projects, credits, transactions
Isometric — projects, credits (daily API)
Puro Earth — projects, credits

Technical Details

Schedule

The pipeline runs every morning and completes before business hours in any major time zone. Raw data fetch, processing, and database seeding happen in sequence automatically.

Infrastructure

Cloud-based orchestration with automated workflows, scalable object storage, and a production PostgreSQL database. Columnar intermediate formats ensure efficient processing of large datasets.

Schema Harmonization

Each registry's unique field names, data types, and conventions are mapped to a common schema. Examples: Verra's "Proponent" and Gold Standard's "Project Developer" both map to the "proponent" field. Credit quantities are standardized to tonnes of CO₂ equivalent.

Quality & Accuracy

Data comes directly from official registry records — the authoritative source for carbon credit information
Full database refresh (not incremental) eliminates data drift and ensures consistency
Columnar intermediate format preserves data types and prevents conversion errors
Daily snapshots enable point-in-time comparison to detect anomalies

Update Frequency

Daily. The pipeline runs every morning and data is available in the platform before business hours.

Related Methodologies

Forensic Intelligence

AI-powered weekly scanning of every carbon project across 100+ sources, delivering categorized intelligence memos with verifiable citations.

Buyer Intelligence

Track 7,000+ carbon credit buyers with portfolio analytics, activity patterns, and AI-powered lead generation for project developers and brokers.

Search the project database

Start with our free tier — no credit card required.

Get Started All Methodologies