Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Burst Memory Access for RTIO DMA & Analyzer #2592

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

occheung
Copy link
Contributor

@occheung occheung commented Oct 10, 2024

ARTIQ Pull Request

Description of Changes

The PR changes one-by-one memory access from DMA and Analyzer to burst memory read/write. Availability of burst access is indicated by the corresponding streaming FIFO control signals within DMA and the analyzer.

Performance

Using the DMASaturate script in #946 on local I/O on standalone/master variant:
Before: 490mu
After: 65mu
It is on par with Zynq variants' performance reported in #946.

The analyzer memory buffer is expanded to observe the RTIO slack (in machine unit).
image
The end-to-beginning transition of DMA playbacks is indicated by a sudden drop of RTIO slack. RTIO slack gradually recovers to the maximum value (SED FIFO depth * DMA events period / DMA events per period).

Test

Passes unittests in artiq.test with test_dma_playback_time re-enabled.

test_dma_playback_time (artiq.test.coredevice.test_rtio.DMATest.test_dma_playback_time) ... dt=0.080111704, dt/count=4.0055852e-06
ok

TODOs:

  • Fix mysterious RPC crash
  • Fix timing for DRTIO masters

Related Issue

Closes #946

See MiSoC 151, MiSoC 150.

Type of Changes

Type
✨ New feature

Steps (Choose relevant, delete irrelevant before submitting)

All Pull Requests

  • Use correct spelling and grammar.
  • Update RELEASE_NOTES.rst if there are noteworthy changes, especially if there are changes to existing APIs.
  • Close/update issues.

Code Changes

  • Run flake8 to check code style (follow PEP-8 style). flake8 has issues with parsing Migen/gateware code, ignore as necessary.
  • Test your changes or have someone test them. Mention what was tested and how.
  • Add and check docstrings and comments
  • Check, test, and update the unittests in /artiq/test/ or gateware simulations in /artiq/gateware/test

Git Logistics

  • Split your contribution into logically separate changes (git rebase --interactive). Merge/squash/fixup commits that just fix or amend previous commits. Remove unintended changes & cleanup. See tutorial.
  • Write short & meaningful commit messages. Review each commit for messages (git show). Format:
    topic: description. < 50 characters total.
    
    Longer description. < 70 characters per line
    

Licensing

See copyright & licensing for more info.
ARTIQ files that do not contain a license header are copyrighted by M-Labs Limited and are licensed under LGPLv3+.

@occheung
Copy link
Contributor Author

The full stack test for DMA was examined.
image
It seems that 4 cycles per DMA event is the theoretical minimum period constrained by the RTIO system. Besides the well-documented 3 cycles latency for the RTIO core, there is also an extra cycle required for TimeOffset to deassert stb and reasserting it, even though a new CRI entry is already available from the record converter.

We can probably save the cycle in TimeOffset.

@occheung
Copy link
Contributor Author

occheung commented Nov 25, 2024

Built on a DRTIO master without any peripherals. From timing report:

+====================+===================+============================================+
| Launch Setup Clock | Launch Hold Clock | Pin                                        |
+====================+===================+============================================+
| sys_clk            | sys_clk           | genericmaster_dma_rawslicer_buf_reg[158]/D |
| sys_clk            | sys_clk           | genericmaster_dma_rawslicer_buf_reg[135]/D |
| sys_clk            | sys_clk           | rtcontroller2_state_reg[1]/D               |
| sys_clk            | sys_clk           | genericmaster_dma_rawslicer_buf_reg[134]/D |
| sys_clk            | sys_clk           | genericmaster_dma_rawslicer_buf_reg[159]/D |
+--------------------+-------------------+--------------------------------------------+

It appears the timing of buf is difficult to satisfy. In addition, from subsequent physical synthesis:

INFO: [Physopt 32-710] Processed net genericmaster_dma_rawslicer_buf[373]_i_1_n_0. Critical path length was reduced through logic transformation on cell genericmaster_dma_rawslicer_buf[373]_i_1_comp_1.

Given the RAM has a 128-bits wide interface, and the maximum RTIO output data width being 32 (due to RTIO log channel), the buf should only use around 260 bits at most. I think vivado is unable to purge the other 480 bits from the unused data width. (Edit: Well it shouldn't be able to detect, we literally use this as a watermark to decide emitting RTIO events or not)

Manually pass a smaller out_size in the RawSlicer would pass timing.

@occheung
Copy link
Contributor Author

Built on a DRTIO master without any peripherals. From timing report ...

Likely caused by BRAM under-utilization when testing with changes in #2647. No tight setup and hold time pins are reported anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kasli DMA sustained event rate
1 participant