Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[16.0][ADD] account_edi_simple_pdf #1091

Open
wants to merge 67 commits into
base: 16.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
10adbcd
Initial check-in of account_invoice_import_simple_pdf
alexis-via Jul 26, 2021
c04d11a
Temporary workaround of odoo bug https://github.com/odoo/odoo/issues/…
alexis-via Aug 24, 2021
4916e48
Improve handling of start/end string cut when there are multiplue occ…
alexis-via Aug 31, 2021
9c49f33
[FIX] when the invoice has the VAT number of the supplier and also th…
alexis-via Sep 15, 2021
0142951
[FIX] Read of the "Page Analysis" parameter
alexis-via Oct 13, 2021
b9ddc00
Improve date extraction when month is a string with accents
alexis-via Oct 24, 2021
de5c9d5
account_invoice_import_simple_pdf: coma -> comma (typo fix)
alexis-via Oct 25, 2021
60c41e1
account_invoice_import_simple_pdf: easier extensibility for new fields
alexis-via Oct 25, 2021
4d2b3ab
[UPD] Update account_invoice_import_simple_pdf.pot
oca-travis Oct 26, 2021
cea112a
[UPD] README.rst
OCA-git-bot Oct 26, 2021
19852dd
[FIX] account_invoice_import_simple_pdf: Remove exclude with invoice2…
etobella Nov 29, 2021
1734a1b
[FIX] account_invoice_import_simple_pdf: extract_rule position_min/po…
alexis-via Feb 8, 2022
99d0369
account_invoice_import_simple_pdf: add onchange on date_format on fie…
alexis-via Feb 8, 2022
41977b5
account_invoice_import_simple_pdf: install pymupdf by Debian package …
alexis-via Feb 8, 2022
5ce7b07
same player try again
alexis-via Feb 8, 2022
ef3c731
Same player try again again
alexis-via Feb 8, 2022
9996f34
Same player try again again again
alexis-via Feb 8, 2022
1a0dead
account_invoice_import_simple_pdf: support multiple tools for text ex…
alexis-via Feb 12, 2022
8b59b74
[FIX] access to form view of partners for users who are not accountants
alexis-via Feb 13, 2022
52b5a77
[UPD] Update account_invoice_import_simple_pdf.pot
Feb 14, 2022
9704c67
[UPD] README.rst
OCA-git-bot Feb 14, 2022
9ecf77c
account_invoice_import_simple_pdf 14.0.2.0.0
OCA-git-bot Feb 14, 2022
9a8f3e9
[FIX] account_invoice_import_simple_pdf: Fix view replace
etobella Mar 31, 2022
f5493a4
Use fix version of dateparser
flotho Apr 26, 2022
76cd3a2
account_invoice_import_simple_pdf 14.0.2.1.0
OCA-git-bot Jun 16, 2022
488795a
account_invoice_import_simple_pdf: add apostrophe as thousand separator
alexis-via Jun 29, 2022
fb22237
account_invoice_import_simple_pdf: parse July 5th, 2022 as date
alexis-via Jun 29, 2022
7750231
[UPD] Update account_invoice_import_simple_pdf.pot
Jun 29, 2022
d423c86
account_invoice_import_simple_pdf 14.0.2.2.0
OCA-git-bot Jun 29, 2022
7d182d8
[FIX] simple_pdf: bad string
alexis-via Jul 14, 2022
a6d1f54
[UPD] Update account_invoice_import_simple_pdf.pot
Jul 15, 2022
2c2def9
account_invoice_import_simple_pdf 14.0.2.2.1
OCA-git-bot Jul 15, 2022
61c1068
account_invoice_import: improve handling of simple PDF invoices
alexis-via Jul 14, 2022
9a03d1c
Added translation using Weblate (French)
klodr Aug 5, 2022
0f978a8
Translated using Weblate (French)
klodr Aug 5, 2022
26908d4
Translated using Weblate (French)
Sep 20, 2022
8ae25d6
simple_pdf: add warning about constraint on regex version imposed by …
alexis-via Sep 21, 2022
7f0f672
simple_pdf: use another invoice as test invoice
alexis-via Sep 22, 2022
1e93173
simple_pdf: raise error if thousand sep = decimal sep
alexis-via Sep 27, 2022
c67c264
[UPD] Update account_invoice_import_simple_pdf.pot
Sep 27, 2022
57cfcf1
[UPD] README.rst
OCA-git-bot Sep 27, 2022
a1560e4
account_invoice_import_simple_pdf 14.0.3.0.0
OCA-git-bot Sep 27, 2022
c8695f9
Update translation files
oca-transbot Sep 27, 2022
7810567
simple_pdf: allow to match partners on additionnal fields
alexis-via Oct 8, 2022
7f0d21b
[UPD] Update account_invoice_import_simple_pdf.pot
Oct 12, 2022
8b8c8c1
account_invoice_import_simple_pdf 14.0.3.1.0
OCA-git-bot Oct 12, 2022
a283bcb
Update translation files
oca-transbot Oct 12, 2022
b43db49
Translated using Weblate (French)
klodr May 29, 2023
96885a9
simple_pdf: add a new type 'Any Character' on invoice number parsing
alexis-via May 30, 2023
98b68c8
account_invoice_import_simple_pdf 14.0.3.2.0
OCA-git-bot Jun 6, 2023
a550679
[FIX] remove pin version of dateparser
florian-dacosta Jan 24, 2023
efbfb7f
[UPD] README.rst
OCA-git-bot Sep 3, 2023
eca7404
[UPD] README.rst
OCA-git-bot Sep 3, 2023
54da3eb
account_invoice_import_simple_pdf 14.0.3.2.1
OCA-git-bot Sep 3, 2023
c7b9f13
[UPD] README.rst
OCA-git-bot Sep 3, 2023
dd899eb
simple_pdf: works with newer PyMuPDF versions
alexis-via Oct 20, 2023
59d7b6e
[BOT] post-merge updates
OCA-git-bot Oct 24, 2023
0adf563
Added translation using Weblate (Spanish)
Ivorra78 Nov 25, 2023
c7f459e
Translated using Weblate (Spanish)
Ivorra78 Nov 25, 2023
466abf9
Translated using Weblate (Spanish)
Ivorra78 Nov 25, 2023
4b7c35f
account_invoice_import_simple_pdf: remove pdfplumber
alexis-via Feb 13, 2024
552c26f
account_invoice_import_simple_pdf: update INSTALL.rst about version o…
alexis-via Feb 13, 2024
b2d8adb
[BOT] post-merge updates
OCA-git-bot Mar 14, 2024
9cbab4e
[ADD] account_edi_simple_pdf
hbrunn Mar 29, 2024
b002d77
[MIG] account_edi_simple_pdf: migration to 16.0
hbrunn Dec 4, 2024
772657b
[ADD] account_edi_simple_pdf: hbrunn as maintainer
hbrunn Dec 4, 2024
5b30e74
[IMP] account_edi_simple_pdf: increase test coverage
hbrunn Dec 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions account_edi_simple_pdf/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
=================
Import Simple PDF
=================

..
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! This file is generated by oca-gen-addon-readme !!
!! changes will be overwritten. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! source digest: sha256:5ab1ebb7747c3603828622a11eb4d2fe9419bb3ccda3f1bbbcaa733014ddcb54
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

.. |badge1| image:: https://img.shields.io/badge/maturity-Beta-yellow.png
:target: https://odoo-community.org/page/development-status
:alt: Beta
.. |badge2| image:: https://img.shields.io/badge/licence-AGPL--3-blue.png
:target: http://www.gnu.org/licenses/agpl-3.0-standalone.html
:alt: License: AGPL-3
.. |badge3| image:: https://img.shields.io/badge/github-OCA%2Fedi-lightgray.png?logo=github
:target: https://github.com/OCA/edi/tree/16.0/account_edi_simple_pdf
:alt: OCA/edi
.. |badge4| image:: https://img.shields.io/badge/weblate-Translate%20me-F47D42.png
:target: https://translation.odoo-community.org/projects/edi-16-0/edi-16-0-account_edi_simple_pdf
:alt: Translate me on Weblate
.. |badge5| image:: https://img.shields.io/badge/runboat-Try%20me-875A7B.png
:target: https://runboat.odoo-community.org/builds?repo=OCA/edi&target_branch=16.0
:alt: Try me on Runboat

|badge1| |badge2| |badge3| |badge4| |badge5|

This module extends Odoo's vendor bill import mechanism with support for simple PDF invoices i.e. PDF invoice that don't have an embedded XML file.

* Possibility to add support for a new vendor without developper skills: the accountant can do it!
* Adding support for a new vendor is faster.
* More tolerance on vendor invoice layout changes.
* Easier to install.

Ihis module uses the following design when importing a PDF vendor bill:

1. raw text extraction of the PDF file,
2. identify the partner using the VAT number (if the VAT number is present in the raw text extraction) or some keywords,
3. use regular expressions (regex) to extract the data needed to create the vendor bill in Odoo (single line configuration).

Under the hood, regular expressions are auto-generated from the configuration made by the user in Odoo. No need to be a regex expert! But you can still write regexes to extract some fields for some very specific needs.

The module can extract the following fields:

* Total Amount with taxes
* Total Untaxed Amount
* Total Tax Amount
* Invoice Date
* Due Date
* Start Date
* End Date
* Invoice Number
* Description (for that field, you have to write a regex)

In this list, only 3 fields are required:

* Invoice Date
* 2 out of the 3 Amount fields (the 3rd can be deducted from the 2 others: Total Amount = Total Untaxed + Total Tax)

To take advantage of the fields *Start Date* and *End Date*, you need the OCA module *account_invoice_start_end_dates* from the `account-closing <https://github.com/OCA/account-closing>`_ project.

**Table of contents**

.. contents::
:local:

Installation
============

The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this `blog post <https://dida.do/blog/how-to-extract-text-from-pdf>`_, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.

The module supports 5 different extraction methods:

1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.

PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python library, so you will always be able to install it whatever your technical environement is.

You can choose one extraction method and only install the tools/libs for that method.

Install PyMuPDF
~~~~~~~~~~~~~~~

Install it via pip:

.. code::

pip3 install --upgrade pymupdf

Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.

Install pdftotext python lib
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To install **pdftotext python lib**, run:

.. code::

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

and then install the lib via pip:

.. code::

pip3 install --upgrade pdftotext

On OSes other than Debian/Ubuntu, follow the instructions on the `project page <https://github.com/jalan/pdftotext>`_.

Install pdftotext command line
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To install **pdftotext command line**, run:

.. code::

sudo apt install poppler-utils

Install pypdf
~~~~~~~~~~~~~

To install the **pypdf** python lib, run:

.. code::

pip3 install --upgrade pypdf


Other requirements
~~~~~~~~~~~~~~~~~~

This module also requires the following Python libraries:

* `regex <https://pypi.org/project/regex/>`_ which is backward-compatible with the *re* module of the Python standard library, but has additional functionalities.
* `dateparser <https://github.com/scrapinghub/dateparser>`_ which is a powerful date parsing library.

The dateparser lib depends itself on regex. So you can install these Python libraries via pip with the following command:

.. code::

pip3 install --upgrade dateparser

The dateparser lib is not compatible with all regex lib versions. As of February 2024, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L36>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27**. So the latest version of dateparser is currenly compatible with the latest version of regex. To know the version of regex installed in your environment, run:


.. code::

pip3 show regex

Configuration
=============

By default, for the PDF to text conversion, the module tries the different methods in the order mentioned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods yields any text (if that then is parsable successfully is another matter), Odoo will display an error message.

If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:

* *Key*: **invoice_import_simple_pdf.pdf2txt**
* *Value*: select the proper value for the method you want to use:

1. pymupdf
#. pdftotext.lib
#. pdftotext.cmd
#. pypdf

In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.

You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this `screencast <https://www.youtube.com/watch?v=edsEuXVyEYE>`_.

Usage
=====

- go to Invoicing -> Vendors -> Bills
- press the button Upload and upload a PDF file. Now the PDF file will be processed by account_edi_simple_pdf

In case your PDF file contains Factur-X data and you have Factur-X activated (by the core module ``account_edi_facturx`` which is autoinstalled), that functionality will be executed instead.

Bug Tracker
===========

Bugs are tracked on `GitHub Issues <https://github.com/OCA/edi/issues>`_.
In case of trouble, please check there if your issue has already been reported.
If you spotted it first, help us to smash it by providing a detailed and welcomed
`feedback <https://github.com/OCA/edi/issues/new?body=module:%20account_edi_simple_pdf%0Aversion:%2016.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**>`_.

Do not contact contributors directly about support or help with technical issues.

Credits
=======

Authors
~~~~~~~

* Akretion
* Hunki Enterprises BV

Contributors
~~~~~~~~~~~~

* Alexis de Lattre <[email protected]>

Maintainers
~~~~~~~~~~~

This module is maintained by the OCA.

.. image:: https://odoo-community.org/logo.png
:alt: Odoo Community Association
:target: https://odoo-community.org

OCA, or the Odoo Community Association, is a nonprofit organization whose
mission is to support the collaborative development of Odoo features and
promote its widespread use.

.. |maintainer-hbrunn| image:: https://github.com/hbrunn.png?size=40px
:target: https://github.com/hbrunn
:alt: hbrunn

Current `maintainer <https://odoo-community.org/page/maintainer-role>`__:

|maintainer-hbrunn|

This module is part of the `OCA/edi <https://github.com/OCA/edi/tree/16.0/account_edi_simple_pdf>`_ project on GitHub.

You are welcome to contribute. To learn how please visit https://odoo-community.org/page/Contribute.
1 change: 1 addition & 0 deletions account_edi_simple_pdf/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import models
32 changes: 32 additions & 0 deletions account_edi_simple_pdf/__manifest__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Copyright 2021 Akretion France (http://www.akretion.com/)
# @author: Alexis de Lattre <[email protected]>
# License AGPL-3.0 or later (http://www.gnu.org/licenses/agpl).

{
"name": "Import Simple PDF",
"version": "16.0.1.0.0",
"category": "Accounting/Accounting",
"license": "AGPL-3",
"summary": "Import simple PDF vendor bills",
"author": "Akretion,Hunki Enterprises BV,Odoo Community Association (OCA)",
"website": "https://github.com/OCA/edi",
"maintainers": ["hbrunn"],
"depends": ["account_edi"],
"external_dependencies": {
"python": [
"regex",
"dateparser",
"pypdf>=3.1.0",
],
"deb": ["libmupdf-dev", "mupdf", "mupdf-tools", "poppler-utils"],
},
"data": [
"security/ir.model.access.csv",
"views/res_partner.xml",
"views/account_invoice_import_simple_pdf_fields.xml",
"views/account_invoice_import_simple_pdf_invoice_number.xml",
],
"demo": ["demo/demo_data.xml"],
"installable": True,
"application": True,
}
62 changes: 62 additions & 0 deletions account_edi_simple_pdf/demo/demo_data.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
<?xml version="1.0" encoding="utf-8" ?>
<odoo noupdate="1">

<record id="mobile_phone" model="product.product">
<field name="name">Mobile phone</field>
<field name="categ_id" ref="product.product_category_5" />
<field name="sale_ok" eval="False" />
<field name="purchase_ok" eval="True" />
<field name="type">service</field>
</record>

<record id="bouygues_telecom" model="res.partner">
<field name="name">Bouygues Telecom</field>
<field name="is_company" eval="True" />
<field name="supplier_rank">1</field>
<field name="street">37 rue Boissière</field>
<field name="zip">75116</field>
<field name="city">Paris</field>
<field name="country_id" ref="base.fr" />
<field name="website">http://www.bouyguestelecom.fr</field>
<field name="vat">FR74397480930</field>
<field name="simple_pdf_date_format">dd-mm-y4</field>
<field name="simple_pdf_date_separator">slash</field>
<field name="simple_pdf_currency_id" ref="base.EUR" />
<field name="simple_pdf_pages">first</field>
<field name="simple_pdf_decimal_separator">comma</field>
<field name="simple_pdf_thousand_separator">space</field>
</record>

<record id="inv_number1" model="account.invoice.import.simple.pdf.invoice.number">
<field name="partner_id" ref="bouygues_telecom" />
<field name="string_type">digit</field>
<field name="occurrence_min">14</field>
<field name="occurrence_max">14</field>
</record>

<record id="inv_amount_total" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">amount_total</field>
<field name="extract_rule">max</field>
</record>

<record id="inv_amount_untaxed" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">amount_untaxed</field>
<field name="extract_rule">first</field>
<field name="start">Montant de la facture soumis à TVA</field>
</record>

<record id="inv_date" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">date</field>
<field name="extract_rule">first</field>
</record>

<record id="inv_invoice_number" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">invoice_number</field>
<field name="extract_rule">first</field>
</record>

</odoo>
Loading
Loading