VCM.fyi Data Pipeline
Data Pipeline
Daily automated collection, harmonization, and delivery of project and credit data from 6 carbon registries into a single unified database.
6
Registries
11,000+
Projects
500,000+
Credit Transactions
Daily
Pipeline Cadence
The VCM.fyi Data Pipeline is the foundation of the entire platform. Every day, raw data from six carbon registries is fetched, processed, harmonized into a common schema, and loaded into the production database. This pipeline ensures that the platform always reflects the latest registry state — new projects, fresh issuances, and recent retirements.
The Problem
Carbon credit data is fragmented across six major registries, each with different data formats, field names, and update schedules. A simple question like "How many credits has this project issued?" requires checking multiple registry websites, each with different interfaces and data structures. Analysts spend hours on manual data collection that could be automated.
How It Works
Raw Data Fetch
Automated workflows fetch the latest data from each registry every morning. Isometric data is retrieved via their API daily. Verra, Gold Standard, ACR, CAR, and Puro Earth data is fetched from raw exports and stored in our canonical data warehouse.
Processing and Harmonization
Raw registry data is processed into a harmonized schema. Field names are standardized (every registry calls things differently), project types are mapped to a common taxonomy, credit quantities are normalized, and duplicate records are resolved. The output is optimized columnar files for analytical queries.
Database Seeding
Processed data is loaded into the production database. This is a full refresh — the database reflects the complete current state of all registries, not just incremental changes. This approach ensures consistency and eliminates drift between the database and registry reality.
Archival
Daily snapshots are archived with 30-day retention, enabling historical comparisons and data recovery. Raw registry data is preserved permanently.
Data Sources
- Verra (VCS) — projects, credits, transactions
- Gold Standard — projects, credits, transactions
- American Carbon Registry (ACR) — projects, credits, transactions
- Climate Action Reserve (CAR) — projects, credits, transactions
- Isometric — projects, credits (daily API)
- Puro Earth — projects, credits
Technical Details
Schedule
The pipeline runs every morning and completes before business hours in any major time zone. Raw data fetch, processing, and database seeding happen in sequence automatically.
Infrastructure
Cloud-based orchestration with automated workflows, scalable object storage, and a production PostgreSQL database. Columnar intermediate formats ensure efficient processing of large datasets.
Schema Harmonization
Each registry's unique field names, data types, and conventions are mapped to a common schema. Examples: Verra's "Proponent" and Gold Standard's "Project Developer" both map to the "proponent" field. Credit quantities are standardized to tonnes of CO₂ equivalent.
Quality & Accuracy
- Data comes directly from official registry records — the authoritative source for carbon credit information
- Full database refresh (not incremental) eliminates data drift and ensures consistency
- Columnar intermediate format preserves data types and prevents conversion errors
- Daily snapshots enable point-in-time comparison to detect anomalies
Update Frequency
Daily. The pipeline runs every morning and data is available in the platform before business hours.
Related Methodologies
Forensic Intelligence
AI-powered weekly scanning of every carbon project across 100+ sources, delivering categorized intelligence memos with verifiable citations.
Buyer Intelligence
Track 7,000+ carbon credit buyers with portfolio analytics, activity patterns, and AI-powered lead generation for project developers and brokers.
Search the project database
Start with our free tier — no credit card required.