COP-SAID: Speaker Identification and Affiliation Extraction in Climate Conference Transcripts

Overview

The United Nations Framework Convention on Climate Change Conference of the Parties (COP) produces thousands of hours of recorded plenary sessions, negotiating group meetings, and side events each year. These recordings contain invaluable evidence of national positions, coalition dynamics, and the evolution of climate policy discourse — but they exist largely as unstructured audio-visual archives that are difficult to search, analyze, or attribute to specific speakers and organizations.

COP-SAID (Speaker Affiliation Identification and Detection) builds an end-to-end NLP pipeline to convert these recordings into structured, speaker-attributed text archives. The key technical insight is that moderators at COP sessions follow predictable introduction patterns (“I now give the floor to Ambassador X from Y”), and these structured cues can serve as high-confidence anchor points for speaker identification across the full transcript.

Technical Approach

The pipeline integrates four technologies into a unified system.

Automatic Speech Recognition (ASR): Tools such as Whisper convert COP audio into time-stamped text transcripts. The multilingual nature of COP sessions — with interpretation in multiple UN languages — is handled through language-specific ASR models.

Speaker Diarization: Systems such as pyannote.audio partition the audio into speaker segments (S1, S2, …), producing a sequence of labeled turns without yet knowing who each speaker is.

Transformer-Based Speaker Identification: A BERT-family or entity-aware transformer model (such as LUKE) processes windowed contexts around each speaker segment. When a moderator introduction is detected — using a fine-tuned classifier or pattern-based rules — the named person and affiliation are queued and linked to the immediately following speaker segment. The model handles cases where multiple speakers are introduced in sequence by disambiguating based on first-person references and contextual cues in the actual speech content.

Knowledge Graph Integration: A curated entity database of known COP participants — names, titles, delegation countries, and organizational roles — enables entity linking and ensures global consistency: if a speaker is identified as Ambassador X in one session, all subsequent segments assigned to the same diarization ID receive the same attribution unless contradicting evidence emerges.

Post-processing applies coreference resolution to link pronouns and titles to identified named entities, with confidence thresholding to route low-confidence segments to human review.

Significance

Incorporating moderator introduction cues into the speaker identification pipeline provides a 5–10% relative improvement in accuracy over rule-based baselines that rely only on self-introductions. The knowledge graph integration significantly reduces cross-session inconsistency errors.

A fully attributed and searchable COP transcript archive would enable several classes of research that are currently impractical: longitudinal tracking of national positions across COP sessions, network analysis of which delegations interact during negotiations, and automated extraction of specific policy commitments or objections. For transparency and accountability in climate governance, the ability to attribute statements precisely to speakers and their organizations is foundational.

The methodology generalizes to other international forums with similar moderator-introduction conventions — G20 sessions, World Economic Forum panels, and UN General Assembly debates — making COP-SAID a platform for a broader program of structured climate policy discourse analysis.