Classify and extract text 10x better and faster 🦾


➡️  Learn more

Genia Dataset

Created by Kim et al. at 2003, the Genia Dataset contains 1,999 Medline abstracts, selected using a PubMed query for the three MeSH terms "human", "blood cells", and "transcription factors". The corpus has been annotated for part-of-speech, contituency syntactic, terms, events, relations, and coreference., in English language. Containing 1,999 in Text, XML file format.

Dataset Sources

Here you can download the Genia dataset in Text, XML format.

Download Genia dataset Text, XML files

Fine-tune with Genia dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️  Learn more

Paper

Read full original Genia paper.

Download PDF paper


Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.