I am currently trying to get the best performance on this 48Mhz clocked Risc-V, to implement a HD6301 emulator (clocked at 4Mhz, instruction clock -> 1M ticks/sec) .
Due to the RISC Instruction set, you almost need to prepare what you expect the compiler to produce…
If you try this code (if C!=0, then branch PC+2+depl, if not, just do PC+2, add 3 to ticks counter) :
void op_bcs(void)
{
if (regs.CC_C!=0) regs.PC+=(2+(signed char)rom[(regs.PC+1)&0xFFF]);
else regs.PC+=2;
TICKS(3);
}
It would produce the following code (18 lines, 46 bytes) :
00002688 <op_bcs>:
2688: 38c18793 addi a5,gp,908 # 2000100c <regs>
268c: 4398 lw a4,0(a5)
268e: 01d7c603 lbu a2,29(a5)
2692: 00270693 addi a3,a4,2
2696: ca19 beqz a2,26ac <op_bcs+0x24>
2698: 0705 addi a4,a4,1 // PC+1
269a: 0752 slli a4,a4,0x14 // <<20 (And FFF)
269c: 661d lui a2,0x7
269e: 8351 srli a4,a4,0x14 // >>20
26a0: cc060613 addi a2,a2,-832 # 6cc0 <rom>
26a4: 9732 add a4,a4,a2
26a6: 00070703 lb a4,0(a4)
26aa: 96ba add a3,a3,a4
26ac: 4f98 lw a4,24(a5)
26ae: c394 sw a3,0(a5)
26b0: 070d addi a4,a4,3
26b2: cf98 sw a4,24(a5)
26b4: 8082 ret
But I have finally coded the BCC (same only bnez instead of beqz) opcode like this , using some tricks :
void op_bcc(void)
{ // romendp1 points to ROM+0x1001, i.e. end+1
u8 depl=(regs.CC_C==0)?regs.romendp1[regs.PCw]:0; // PCw is int16_t part of PC
regs.PC=regs.PC+2+(signed char)depl; // But calculate with PC, to avoid
TICKS(3); // adjustment to word
}
Which produces (16 lines, 40 bytes) – Can it be shorter ? :
00002660 <op_bcc>:
2660: 38c18793 addi a5,gp,908 # 2000100c <regs>
2664: 01d7c703 lbu a4,29(a5)
2668: 4601 li a2,0
266a: e719 bnez a4,2678 <op_bcc+0x18>
266c: 00079683 lh a3,0(a5)
2670: 4bd8 lw a4,20(a5)
2672: 9736 add a4,a4,a3
2674: 00070603 lb a2,0(a4)
2678: 4398 lw a4,0(a5)
267a: 4f94 lw a3,24(a5)
267c: 0709 addi a4,a4,2
267e: 9732 add a4,a4,a2
2680: 068d addi a3,a3,3
2682: c398 sw a4,0(a5)
2684: cf94 sw a3,24(a5)
2686: 8082 ret
Carefully coding the rest of op-codes like this, and using union/structs for HD6301 registers, allowed the small CH32X035 to emulate more than 1.000.000 ticks/seconds…
Edit : I think that this (i.e. no depl=0, with add depl)
void op_bcc(void)
{
if (regs.CC_C==0) regs.PC=(regs.PC+2)+(s8)regs.romendp1[regs.PCw];
else regs.PC=(regs.PC+2);
TICKS(3);
}
is even better… 15 lines, same size (40 bytes) , but 10 instr. if C=1, instead of 12
00002660 <op_bcc>:
2660: 38c18793 addi a5,gp,908 # 2000100c <regs>
2664: 4398 lw a4,0(a5)
2666: 01d7c683 lbu a3,29(a5)
266a: 0709 addi a4,a4,2
266c: ea81 bnez a3,267c <op_bcc+0x1c>
266e: 00079603 lh a2,0(a5)
2672: 4bd4 lw a3,20(a5)
2674: 96b2 add a3,a3,a2
2676: 00068683 lb a3,0(a3)
267a: 9736 add a4,a4,a3
267c: 4f94 lw a3,24(a5)
267e: c398 sw a4,0(a5)
2680: 00368713 addi a4,a3,3
2684: cf98 sw a4,24(a5)
2686: 8082 ret