a Mean concept identifiers per unique mention string.
b Mean surface forms per concept identifier; only for corpora with IDs.
c Shannon entropy of label distribution in bits; 0 = single entity type.
All values are Jaccard similarity (intersection / union) between splits.
Overlap cascade
Each line traces one corpus across four abstraction levels. Lines that terminate before the identifier level indicate corpora without concept normalization.
Unique journal count
Journal concentration
Top-1 journalTop-3 journals
9 of 9 corpora have journal metadata. Unique journal count
measures language diversity. Concentration reveals whether the corpus is dominated by a small
number of sources. Faded bars indicate corpora with no metadata.
Publication year range
Decade share per corpus
Year-by-year: oldest vs most recent
Hover range bars for the mode year. Corpora anchored in pre-2000 literature
risk reduced performance on contemporary terminology.
Article topic distribution per corpus (%)
Topic
AnatEM
BC5CDR
BioID
CHEMDNER
CRAFT
CellLink
JNLPBA
NCBI-Disease
NLM-Chem
Multidisciplinary
2%
—
2%
—
—
—
—
—
—
Cell & developmental biology
7%
1%
10%
5%
9%
17%
12%
4%
6%
Molecular biology / biochemistry
16%
6%
62%
15%
21%
12%
36%
17%
18%
Genetics/genomics
5%
—
8%
3%
18%
8%
10%
28%
4%
Neuroscience & neurology
2%
4%
1%
1%
2%
4%
—
2%
—
Microbiology/pathogenesis
2%
—
2%
1%
—
2%
2%
—
1%
Pharmacology
2%
6%
—
6%
—
—
1%
—
2%
Toxicology
—
—
—
1%
—
—
—
—
—
Oncology
4%
—
—
—
—
2%
—
1%
1%
Public health / health services
3%
2%
—
2%
1%
1%
—
—
2%
Chemistry / Materials Science
8%
15%
—
29%
4%
4%
11%
5%
29%
Immunology
1%
—
2%
—
—
4%
6%
1%
—
Psychiatry & psychology
1%
3%
—
1%
1%
—
—
—
—
Health disciplines
3%
2%
—
2%
1%
1%
—
—
2%
General biology / anatomy / physiology
14%
14%
6%
18%
23%
22%
11%
11%
14%
General natural sciences
2%
—
—
3%
1%
2%
3%
3%
4%
General / internal medicine
6%
6%
7%
2%
1%
3%
—
1%
2%
Nutrition, metabolism, and food science
—
—
—
1%
—
—
—
1%
—
Surgery / anesthesia / perioperative
2%
2%
—
—
—
—
—
—
—
Diagnostics / pathology / radiology
2%
4%
—
2%
1%
4%
1%
2%
2%
Pediatrics / reproductive / developmental medicine
—
—
—
—
—
—
—
2%
—
Clinical specialties by organ system
8%
11%
—
3%
7%
9%
2%
5%
4%
Demographic characteristics
7%
18%
—
4%
4%
5%
3%
16%
6%
Total shown
100%
100%
100%
100%
100%
100%
100%
100%
100%
LowerHigher within topic
Topics are high-level MeSH-derived article categories resolved from article
metadata MeSH terms, with unresolved article-term fractions filled from journal
MeSH topics and configured journal-name fallback topics. Only topics with ≥ 1%
share in at least one corpus are shown. Dominant value per row is bold.
Percentages may not sum to exactly
100 due to rounding.
Journal topic distribution per corpus (%)
Topic
AnatEM
BC5CDR
BioID
CHEMDNER
CRAFT
CellLink
JNLPBA
NCBI-Disease
NLM-Chem
Multidisciplinary
4%
—
2%
—
—
5%
6%
12%
8%
Cell & developmental biology
5%
—
10%
1%
13%
10%
6%
2%
2%
Molecular biology / biochemistry
7%
2%
62%
12%
8%
6%
16%
3%
20%
Genetics/genomics
4%
—
8%
2%
30%
5%
4%
56%
3%
Neuroscience & neurology
2%
10%
1%
—
7%
4%
—
3%
3%
Microbiology/pathogenesis
3%
—
2%
—
—
2%
6%
—
1%
Pharmacology
3%
9%
—
24%
—
—
2%
—
—
Toxicology
—
3%
—
11%
—
—
—
—
—
Oncology
13%
4%
—
—
3%
2%
4%
2%
3%
Public health / health services
3%
—
—
1%
—
—
—
—
—
Chemistry / Materials Science
4%
—
—
20%
—
8%
4%
—
26%
Immunology
1%
—
2%
—
—
12%
22%
1%
—
Psychiatry & psychology
2%
7%
—
1%
—
—
—
—
3%
Health disciplines
6%
6%
—
2%
—
2%
3%
1%
3%
General biology / anatomy / physiology
5%
2%
6%
7%
14%
18%
4%
3%
5%
General natural sciences
4%
2%
—
6%
12%
11%
4%
3%
8%
General / internal medicine
14%
21%
7%
3%
5%
11%
11%
8%
11%
Nutrition, metabolism, and food science
—
—
—
5%
—
—
—
—
—
Surgery / anesthesia / perioperative
3%
9%
—
—
—
—
—
—
—
Diagnostics / pathology / radiology
4%
3%
—
—
—
—
—
—
—
Pediatrics / reproductive / developmental medicine
1%
3%
—
—
—
—
—
2%
—
Clinical specialties by organ system
10%
16%
—
4%
6%
4%
5%
2%
1%
Total shown
100%
100%
100%
100%
100%
100%
100%
100%
100%
LowerHigher within topic
Topics are high-level MeSH-derived journal categories resolved from the journal
record's NLM Catalog MeSH topics, with configured journal-name fallback topics for
journals that do not have MeSH topics. Only topics with ≥ 1% share in at least one
corpus are shown. Dominant value per row is bold. Percentages may not sum to exactly
100 due to rounding.
Deprecated terms summary
Corpus
Terminology
Total concepts
Deprecated concepts
Resolvable identifier rate
Resolvable identifier rate
Coverage counts only identifiers whose resource is associated with the selected terminology.
Annotation depth distribution
Terminology coverage
Terminology coverage = unique corpus concept count in branch ÷ total terminology concepts in that branch.
Only branches with signal in the selected scope are shown.
Annotation topic coverage
Annotation topic coverage = annotation-weighted branch count ÷ all identifiers for that corpus and entity scope, including deprecated identifiers in the denominator.