███ ███ ███████ ████████ █████ ██████ ██████ ██ ████ ████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████ ██ █████ ██ ███████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ███████ ██ ██ ██ ██████ ██████ ██ ── Data sources ───────────────────────────────────────────────────────────────── - Crossref (crossref.org), DataCite (datacite.org), OpenAlex (openalex.org) Primary metadata (title, abstract) and DOI resolution. - Retraction Watch (gitlab.com/crossref/retraction-watch-data) Title, status, and retraction reasons. Synced daily. Reinstatements act as chronological resets; retractions supersede expressions of concern, pooling reasons exclusively from the dominant status (dedup). - Unpaywall (unpaywall.org) Open-access routing with fallback to native doi.org resolution. ── Abstract extraction mechanism ──────────────────────────────────────────────── Greedy facility-location coverage using BGE-M3 (BAAI/bge-m3) sentence embeddings. First pick: the centroid-nearest, semantically representative sentence. Subsequent picks: max-marginal gain until gain ≤ μ (mean pairwise similarity). Output: selected sentences in original order, with "…" marking gaps.