Classify and extract text 10x better and faster 🦾

Polish Parliamentary Corpus (PPC) Dataset

Created by Maciej Ogrodniczuk at 2018, the Polish Parliamentary Corpus (PPC) Dataset is a collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus., in Polish language. Containing 3,000+ in XML file format.

Dataset Sources

Here you can download the Polish Parliamentary Corpus (PPC) dataset in XML format.

Download Polish Parliamentary Corpus (PPC) dataset XML files

Fine-tune with Polish Parliamentary Corpus (PPC) dataset

Metatext is a powerful no-code tool for train, tune and integrate custom NLP models

➡️ Learn more

Paper

Read full original Polish Parliamentary Corpus (PPC) paper.

Download PDF paper

Classify and extract text 10x better and faster 🦾

Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.

Book a demo