Part of speech (POS) taggers and dependency parsers tend to work well on homogeneous datasets but their performance suffers on datasets containing data from different genres. In our current work, we investigate how to create POS tagging and dependency parsing experts for heterogeneous data by employing topic modeling. We create topic models (using Latent Dirichlet Allocation) to determine genres from a heterogeneous dataset and then train an expert for each of the genres.
@inproceedings{mukherjeeetal17eacl, title={Creating POS Tagging and Dependency Parsing Experts via Topic Modeling}, author={Atreyee Mukherjee and Sandra Kuebler and Matthias Scheutz}, year={2017}, booktitle={EACL}, url={https://hrilab.tufts.edu/publications/mukherjeeetal17eacl.pdf} }