Polish Parliamentary Corpus (PPC) Dataset
Created by Maciej Ogrodniczuk at 2018, the Polish Parliamentary Corpus (PPC) Dataset is a collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus., in Polish language. Containing 3,000+ in XML file format.
Dataset Sources
Here you can download the Polish Parliamentary Corpus (PPC) dataset in XML format.
Download Polish Parliamentary Corpus (PPC) dataset XML files
Fine-tune with Polish Parliamentary Corpus (PPC) dataset
Metatext is a powerful no-code tool for train, tune and integrate custom NLP models
Paper
Read full original Polish Parliamentary Corpus (PPC) paper.
Classify and extract text 10x better and faster 🦾
Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.