Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use uint16_t instead of uint32_t when writing to OCR4C register #4149

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

gudnimg
Copy link
Collaborator

@gudnimg gudnimg commented Apr 15, 2023

This is just an idea to save more memory.

Using uint16_t instead of uint32_t reduces code size and probably is quicker to execute

OCR4C register is 2 bytes on ATmega2560
It's 1 byte on ATmega32u4 and ATmega16u4

image

Tested these changes on MK3S+. Editing print fan speed in the menus works fine and sound feedback works. I also ran a selftest and it seems to work the same as before.

Change in memory:
Flash: -80 bytes
SRAM: 0 bytes

@gudnimg
Copy link
Collaborator Author

gudnimg commented Apr 15, 2023

Example:

BEFORE

OCR4C = (((uint32_t)duty) * ((uint32_t)((TIMSK4 & (1 << OCIE4A))?OCR4A:255))) / ((uint32_t)255);
    e578:	30 e0       	ldi	r19, 0x00	; 0
    e57a:	50 e0       	ldi	r21, 0x00	; 0
    e57c:	40 e0       	ldi	r20, 0x00	; 0
    e57e:	80 91 72 00 	lds	r24, 0x0072	; 0x800072 <__TEXT_REGION_LENGTH__+0x7c2072>
    e582:	81 ff       	sbrs	r24, 1
    e584:	14 c0       	rjmp	.+40     	; 0xe5ae <setExtruderAutoFanState(unsigned char)+0xa6>
    e586:	60 91 a8 00 	lds	r22, 0x00A8	; 0x8000a8 <__TEXT_REGION_LENGTH__+0x7c20a8>
    e58a:	70 91 a9 00 	lds	r23, 0x00A9	; 0x8000a9 <__TEXT_REGION_LENGTH__+0x7c20a9>
    e58e:	90 e0       	ldi	r25, 0x00	; 0
    e590:	80 e0       	ldi	r24, 0x00	; 0
    e592:	0f 94 42 df 	call	0x3be84	; 0x3be84 <__mulsi3>
    e596:	2f ef       	ldi	r18, 0xFF	; 255
    e598:	30 e0       	ldi	r19, 0x00	; 0
    e59a:	40 e0       	ldi	r20, 0x00	; 0
    e59c:	50 e0       	ldi	r21, 0x00	; 0
    e59e:	0f 94 52 df 	call	0x3bea4	; 0x3bea4 <__udivmodsi4>
    e5a2:	30 93 ad 00 	sts	0x00AD, r19	; 0x8000ad <__TEXT_REGION_LENGTH__+0x7c20ad>
    e5a6:	20 93 ac 00 	sts	0x00AC, r18	; 0x8000ac <__TEXT_REGION_LENGTH__+0x7c20ac>

Note the call to __mulsi3 above:

Multiplication 32 x 32 with MUL

0003be84 <__mulsi3>:
   3be84:	db 01       	movw	r26, r22
   3be86:	8f 93       	push	r24
   3be88:	9f 93       	push	r25
   3be8a:	0f 94 74 df 	call	0x3bee8	; 0x3bee8 <__muluhisi3>
   3be8e:	bf 91       	pop	r27
   3be90:	af 91       	pop	r26
   3be92:	a2 9f       	mul	r26, r18
   3be94:	80 0d       	add	r24, r0
   3be96:	91 1d       	adc	r25, r1
   3be98:	a3 9f       	mul	r26, r19
   3be9a:	90 0d       	add	r25, r0
   3be9c:	b2 9f       	mul	r27, r18
   3be9e:	90 0d       	add	r25, r0
   3bea0:	11 24       	eor	r1, r1
   3bea2:	08 95       	ret

Triggers a call to

  • __muluhisi3: Multiply an unsigned 16-bit integer with a 32-bit integer to a 32-bit result
  • __umulhisi3: Multiply 2 unsigned 16-bit integers to a 32-bit result
0003bee8 <__muluhisi3>:
   3bee8:	0f 94 7f df 	call	0x3befe	; 0x3befe <__umulhisi3>
   3beec:	a5 9f       	mul	r26, r21
   3beee:	90 0d       	add	r25, r0
   3bef0:	b4 9f       	mul	r27, r20
   3bef2:	90 0d       	add	r25, r0
   3bef4:	a4 9f       	mul	r26, r20
   3bef6:	80 0d       	add	r24, r0
   3bef8:	91 1d       	adc	r25, r1
   3befa:	11 24       	eor	r1, r1
   3befc:	08 95       	ret

0003befe <__umulhisi3>:
   3befe:	a2 9f       	mul	r26, r18
   3bf00:	b0 01       	movw	r22, r0
   3bf02:	b3 9f       	mul	r27, r19
   3bf04:	c0 01       	movw	r24, r0
   3bf06:	a3 9f       	mul	r26, r19
   3bf08:	70 0d       	add	r23, r0
   3bf0a:	81 1d       	adc	r24, r1
   3bf0c:	11 24       	eor	r1, r1
   3bf0e:	91 1d       	adc	r25, r1
   3bf10:	b2 9f       	mul	r27, r18
   3bf12:	70 0d       	add	r23, r0
   3bf14:	81 1d       	adc	r24, r1
   3bf16:	11 24       	eor	r1, r1
   3bf18:	91 1d       	adc	r25, r1
   3bf1a:	08 95       	ret

AFTER

No call to __mulsi3 anymore. Also __udivmodsi4 (used on long) changes to __udivmodhi4 (short type) which is neat.

OCR4C = (((uint16_t)duty) * ((uint16_t)((TIMSK4 & _BV(OCIE4A)) ? OCR4A : 255U))) / 255U;
    e566:	30 e0       	ldi	r19, 0x00	; 0
    e568:	80 91 72 00 	lds	r24, 0x0072	; 0x800072 <__TEXT_REGION_LENGTH__+0x7c2072>
    e56c:	4f ef       	ldi	r20, 0xFF	; 255
    e56e:	50 e0       	ldi	r21, 0x00	; 0
    e570:	81 ff       	sbrs	r24, 1
    e572:	04 c0       	rjmp	.+8      	; 0xe57c <setExtruderAutoFanState(unsigned char)+0x82>
    e574:	40 91 a8 00 	lds	r20, 0x00A8	; 0x8000a8 <__TEXT_REGION_LENGTH__+0x7c20a8>
    e578:	50 91 a9 00 	lds	r21, 0x00A9	; 0x8000a9 <__TEXT_REGION_LENGTH__+0x7c20a9>
    e57c:	24 9f       	mul	r18, r20
    e57e:	c0 01       	movw	r24, r0
    e580:	25 9f       	mul	r18, r21
    e582:	90 0d       	add	r25, r0
    e584:	34 9f       	mul	r19, r20
    e586:	90 0d       	add	r25, r0
    e588:	11 24       	eor	r1, r1
    e58a:	6f ef       	ldi	r22, 0xFF	; 255
    e58c:	70 e0       	ldi	r23, 0x00	; 0
    e58e:	0f 94 10 dd 	call	0x3ba20	; 0x3ba20 <__udivmodhi4>
    e592:	70 93 ad 00 	sts	0x00AD, r23	; 0x8000ad <__TEXT_REGION_LENGTH__+0x7c20ad>
    e596:	60 93 ac 00 	sts	0x00AC, r22	; 0x8000ac <__TEXT_REGION_LENGTH__+0x7c20ac>

@leptun
Copy link
Collaborator

leptun commented Apr 15, 2023

Editing print fan speed in the menus works fine

Timer4 is used for the ALTFAN PWM and for the beeper tone. The print fan is not controlled by this. Try sending different M300 frequencies and also maybe try to have the side fan running to see that the pwm output doesn't change too much when the timer frequency changes. You need the altfan to test this. I don't have an altfan at the moment on the printer, so can't test this.

It's 1 byte on ATmega32u4 and ATmega16u4

Teeeechnically it's 10bit, but it doesn't matter since we don't use Timer4 there anyway.

@gudnimg gudnimg added this to the FW 3.14.0 milestone Apr 16, 2023
@gudnimg
Copy link
Collaborator Author

gudnimg commented Apr 16, 2023

Unfortunately I don't have an altfan :(

Hopefully someone at Prusa or someone from the Community can try this change. Let's leave it for 3.14 for now (focus on 3.13 instead).

@leptun
Copy link
Collaborator

leptun commented Apr 16, 2023

I have altfans, but none on a printer. I just didn't find the time to service the printer in the past month or so. I'm planning to switch to the latest mmu3 extruder parts, altfan, superpinda, new print fan that is not broken, latest einsy revision and the lcd with the broken bootloader rendering.

As soon as I find the time to switch to the altfan, I'll test this.

Using uint16_t instead of uint32_t reduces code size
and probably is quicker to execute

OCR4C register is 2 bytes on ATmega2560
It's 1 byte on ATmega32u4 and ATmega16u4

Change in memory:
Flash: -80 bytes
SRAM: 0 bytes
@gudnimg
Copy link
Collaborator Author

gudnimg commented Oct 15, 2023

Rebased PR to sync with MK3 branch. (462 commits)

@github-actions
Copy link

All values in bytes. Δ Delta to base

Target ΔFlash ΔSRAM Used Flash Used SRAM Free Flash Free SRAM
MK3S_MULTILANG -80 0 245918 5633 8034 2559
MK3_MULTILANG -80 0 245204 5642 8748 2550

@3d-gussner 3d-gussner self-assigned this Nov 18, 2023
@3d-gussner 3d-gussner merged commit 3216ef8 into prusa3d:MK3 Nov 29, 2023
5 checks passed
@3d-gussner 3d-gussner removed their assignment Nov 29, 2023
@gudnimg gudnimg deleted the tone-opt branch September 1, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants