Implement InfoNCE contrastive loss from “Representation Learning with Contrastive Predictive Coding” (Oord et al., 2018).
InfoNCE is used in CLIP, SimCLR, and many self-supervised learning methods.
Given a batch of N anchor-positive pairs with embeddings:
anchors: shape (N, d) — anchor embeddings positives: shape (N, d) — positive embeddings (matched to anchors) temperature: float — temperature scaling The loss treats each positive as the correct match and all other positives as negatives: $$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \log \frac{\exp(a_i \cdot p_i / \tau)}{\sum_{j=1}^{N} \exp(a_i \cdot p_j / \tau)}$$
Output: A scalar (float) — the average InfoNCE loss.