Synthetic Documents for Layout Recognition

Dataset of synthetic document images along with labels and bounding boxes of the layout elements. The documents correspond to three different domains namely articles, resumes and forms. We focus mainly on the document structure and produce visually unique samples capturing complex and diverse layouts.  The layout categories include generic elements such as titles, sections, headers/footers, tables, figures etc. and domain specific elements such as equations, skills, profiles, questions, answers etc. 

Sample Synthetic Document With Annotation

Sample Synthetic Document With Annotation


1.  Synthetic Document Generator for Annotation-free Layout Recognition.
    N Raman, S Shah, and M Veloso.
    Pattern Recognition, 2022.

