This is NKJP1M-SGJP — NKJP1M re-annotated using the Morfeusz SGJP tagset
========================================================================

NKJP1M  is a  1  million  word manually  annotated  sub-corpus of  the
National Corpus  of Polish (NKJP).  It is  the main resource  used for
training  taggers   of  Polish.  Unfortunately,  NKJP   was  annotated
according to a tagset, which is  somewhat different than the tagset of
morphological analyser Morfeusz SGJP.

Here, we  present NKJP1M-SGJP  — a version  of NKJP1M  re-annotated in
accordance  with the  tagset of  Morfeusz SGJP.  Thus, taggers  can be
trained  compatible with  Morfeusz without  any tagset  conversion. We
intend  to maintain  this  version  of the  corpus  both  in terms  of
correcting errors and keeping it compatible with Morfeusz.

This version  is based  (and inherits  the licence  of) NKJP1M  v. 1.2
(cf. http://clip.ipipan.waw.pl/NationalCorpusOfPolish)

Compared to  NKJP1M v.1.2, this version  includes numerous corrections
introduced by hand (in case of typos and annotation errors).

Besides the ann_segmentation and ann_morphosyntax layers, this version
includes the  ann_named, ann_words, and ann_groups  layers, which were
automatically mapped to the new morphosyntax. For about 40 segments an
exact match could not be found in the changed morphosyntax. These will
require  manual correction  but for  now the  respective segments  are
marked with an identifier containing the string #missing_.

Contact for this version of the corpus: morfeusz@ipipan.waw.pl