In this project, we show the benefits of using models trained on a close domain, using the TableBank dataset, for fine-tuning table detection models. In addition, we provide all the tools for using the constructed models, and fine-tune new detection models with custom datasets.
We have started by training several models for the TableBank dataset. All the models, results, and tools are available in the TableBank page of this repository.
You can use the trained models with the following notebooks.
From the models constructed with the TableBank dataset, we have fine-tuned models for table detection in different sources. All the information about this process is explained in the Model Zoo for table detection page where we show the benefits of applying fine-tuning models generated from the TableBank dataset compared to models trained with natural images.
We provide the necessary tools to create custom table detection models using as a basis the models that we have constructed using the TableBank dataset. The instructions are provided in the Fine-tuning page.
Use this bibtex to cite this work:
@misc{CasadoGarcia19,
title={The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images},
author={A. Casado-García and C. Domínguez and J. Heras and E. Mata and V. Pascual},
year={2019},
note={\url{https://github.com/holms-ur/fine-tuning/}},
}
This work was partially supported by Ministerio de Economía y Competitividad [MTM2017-88804-P], Ministerio de Ciencia, Innovación y Universidades [RTC-2017-6640-7], Agencia de Desarrollo Económico de La Rioja [2017-I-IDD-00018], and the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF). CETA-CIEMAT belongs to CIEMAT and the Government of Spain.