Deja Vu #05: CODING: Techniques and Optimizations for ZX Spectrum

SoundTrack: DEVOUR OF BRAIN BY DX-69 1998  
__________________________________________

(C) Card!nal/PGC/BD
__________________________________________

Hello, dear readers of our magazine! Daniil asked me to write an article on coding, and here I am sitting behind a text editor. This article does not have a specific goal; I will just talk about optimization and, in general, what I have accumulated since the release of the fourth issue of Deja Vu.

First, let's talk about outputting sprites without attributes. Here is the standard procedure:

LD HL,adr ;sprite address
LD DE,#4000 ;screen address
LD B,192 ;height in pixels
LD C,32 ;width in character positions
LOOP2 PUSH DE
PUSH BC
LD B,0
LDIR
POP BC
POP DE
INC D
LD A,D
AND 7
JP NZ,METKA1
LD A,E
ADD A,32
LD E,A
JP C,METKA1
LD A,D
SUB 8
LD D,A
METKA1 DJNZ LOOP2
RET

Now let's calculate how many cycles it works. Without going into details on how I calculated this, I will say that it works about 146000 cycles when outputting a sprite the size of the screen. On average, it takes 23.7 cycles per byte. In general, this is a decent procedure for outputting static images, but if you need a fast, UNIVERSAL procedure, then look at the listing below. By "universal," I mean that you can arbitrarily set the height, width of the sprite, and its coordinates on the screen.

CALL DECRUN ;this program creates
;a table of addresses
;of the screen

LD HL,adr ;sprite address
LD D,3 ;X coord in character positions
LD E,8 ;Y coord in pixels
LD B,100 ;height in pixels
LD C,20 ;width in character positions

WIWOD DI
LD A,D
LD (WIW1+1),A
LD A,C
ADD A,A
SUB 64
NEG
LD C,A
LD A,B
LD B,0
LD IX,WIW2
ADD IX,BC
EX DE,HL
LD H,B
ADD HL,HL
LD BC,BUFER
ADD HL,BC
LD (STACK+1),SP
LD SP,HL
EX DE,HL
LD B,A
LOOP1 POP DE
LD A,E
WIW1 ADD A,0
LD E,A
LD C,D
JP (IX)
WIW2
DUP 32
LDI ;LDI command is repeated
EDUP ;32 times

DJNZ LOOP1
STACK LD SP,0
EI
RET

DECRUN LD HL,BUFER
LD DE,#4000
LD B,192
LOOP LD (HL),E
INC HL
LD (HL),D
INC HL
INC D
LD A,D
AND 7
JR NZ,METKA
LD A,E
ADD A,32
LD E,A
JR C,METKA
LD A,D
SUB 8
LD D,A
METKA DJNZ LOOP
RET
BUFER DEFS 384

The procedure works 108076 cycles when outputting a sprite the size of the screen, which is 1.35 times faster than the previous one. On average, it takes 17.6 cycles per byte. The subroutine DECRUN is called only once, and then you can use the WIWOD subroutine as many times as you want. I will only say that you need to disable interrupts since the stack is used. Try to figure out how everything works by yourself. I will only hint at why the command LD C,D is needed before the command JP (IX). As you know, LDI decreases the BC register pair during operation, so to prevent the B register from being corrupted, we need to constantly keep in the C register a number greater than the maximum width of the sprite, that is, greater than 32, and the D register always contains a number greater than 32.

Well, we have figured out sprite output; let's talk about rotating a sprite at any angle (0 - 360). In the fourth Deja Vu, there was the source code of Kolotov's "TURN SPRITES." Honestly, I saw this program even earlier and will say that the quality of the program's work did not please me much :-(. I tried to rotate a filled square of 5 by 5 character positions by 45 degrees and saw an unpleasant picture; the square already looked more like a square sieve, and using such a program was difficult. The problem lies in the rotation algorithm and the lack of consideration for the screen structure. I implemented a slightly different algorithm, and the result is simply amazing. The image is almost (considering the low screen resolution) not distorted. For example, if you rotate a filled circle with a radius of 50 by 45 degrees around its center, you will get the same circle. However, I did not come up with this algorithm; I took it from the magazine ZX NEWS 3. There was an article about sprite rotation, and a program was provided in BASIC. This program worked for more than an hour, rotating the entire screen. There was no equivalent in the codes of this program, and the author confessed that he was not friends with the built-in BASIC calculator. Honestly, I also do not get along with the built-in calculator, so I implemented this algorithm in code without using the calculator. The first version of my program worked for 3.5 minutes, rotating the entire screen regardless of the angle. I decided that this was complete SUXX and optimized it a bit. In the end, it took about 40 seconds, which is approximately 5 times faster than the initial version and 100 times faster than in BASIC, and this, mind you, without losing image quality. Further optimization was too lazy, although there was a prospect of reducing the time to 10-15 seconds, or even faster, but I can provide you with this joy, SERZH, you understand me :-). In the attachment, you will find the source code in ALASM 3.8c format. The source contains comments, so you will figure it out. Do not look at the shabby code; there was no time to polish it. The origin of coordinates is considered the lower left corner of the screen, if I am not mistaken. The sprite to be rotated (don't let yourself dry out!) must already be on the screen. The program uses a decent amount of memory (6144 + about 2 kg) bytes for the buffer, you will see there... I almost forgot to explain the rotation algorithm. A point with coordinates (x0,y0) is taken, rotated at an angle, and the coordinate (x1,y1) is obtained; from the coordinate (x1,y1), a point (if it exists) is taken and transferred to coordinate (x0,y0), then the next (x0,y0) is taken, and so on. Checks for going beyond the screen are made, OF COZZZ, so everything is on the khokhloma :-).

So? Let's move on! Here, a certain CRANK asked me to help understand the "abstract xor's," so I will say this. You must learn to understand xor's by yourself (with the help of STS v6.2, for example). Try to figure out what it xor's, trace it, carefully monitor the registers, resident STS's move, if necessary, to a safe place. The simplest xor looks like this:

LD HL,adr
LD BC,length
LOOP LD A,(HL)
XOR 23
LD (HL),A
INC HL
DEC BC
LD A,B
OR C
JR NZ,LOOP
RET

If you have a PC, you can type the program in assembler or in hiew (if you are a pervert :-)).

mov dx,adr
mov bx,length
loop mov si,dx
mov al,[si]
xor al,23
mov [si],al
inc dx
dec bx
jnz loop
retn

Of course, I don’t care where to code, but SPECCY, OF COZZZ, RULEZ forever!!!

With xor's, we seem to have figured it out, both on the Spectrum and on the PC, so let's move on :-). I would like to say a few words about switching pages. Here, for example, is a program I saw from SERZH in one of his rulez (as always) articles:

LD BC,#7FFD
OUT (C),A
LD (BANK),A
RET
BANK DEFB 0

Or something like that. What do we see? Everything seems correct... But no! Let's imagine a situation where interrupts came in between the commands out (c),a and ld (bank),a. The interrupts will work and return back. But when exiting, the old value of BANK will be restored, and as a result, colorful squares will appear on the screen, so the value must be saved before out (c),a.

Since Deja Vu will now be released without protection, I wanted to talk about a cool copy protection method invented by me (the disk cannot be copied even on PC and Amiga) and even wanted to throw the source codes into the application (formatter and format reader), but then I thought it would be too much! And I gave up this idea. However, if I am asked nicely, I might tell. Now let's move on.

I want to say a few words about the game Black Raven, a cool game, I must say. But some levels are just killer (I mean almost impassable), and since I am not a gamer and do not indulge in games much, I made my life easier by inserting a couple of POKES and saw the FINAL CUT! If you zero out three bytes at address #D5CE in RAM0, you will stop running out of money. Well, the current level number is stored at address #79D8. For example, enter the number of the next level there, do "retry," and you are already at another level. If you enter #10 there, you will see the FINAL CUT. "All this is wonderful," you will say - "But what about the protection?" Here you are right; even the shadow won't help. So figure it out yourself, break it however you want. I do not have the moral right to teach you to break commercial products. I will only say that I spent three days inserting POKES!

Next, let's talk about the multiplication algorithm. The simplest multiplication procedure on SPECCY looks like this: HL = A*E

LD H,A
LD L,0
LD D,L
LD B,8
LOOP ADD HL,HL
JR NC,$+4
ADD HL,DE
DJNZ LOOP
RET

It is not difficult to calculate the number of cycles: a minimum of 310, a maximum of 358 without considering RET. Of course, if you unfold the loop, we get a minimum of 206 and a maximum of 254 cycles. But can it be faster? Yes, of cozzz! But for this, we need to remember basic algebra. More precisely, the formulas for the abbreviated multiplication:

Square of a sum: (a+b)^2=a^2+2ab+b^2
Square of a difference: (a-b)^2=a^2-2ab+b^2

Let's take these formulas into account. To write a multiplication procedure, we need to form a table of squares of numbers from 0 to 255 inclusive. The square table will be the main part of the program. The address of the table must be a multiple of 256, that is, the least significant byte = 0. This is done by the following program.

SET_TBL LD DE,TABL
LD H,E
LD L,E
LD B,E
LD C,E
SET_T1 LD A,L
LD (DE),A
INC D
LD A,H
LD (DE),A
DEC D
ADD HL,BC
INC C
ADD HL,BC
INC E
JR NZ,SET_T1
RET
TABL DEFS 512

And here is the multiplication procedure itself. Numbers from 0 to 255 are set in registers L and E. The answer is in HL.

MULS LD H,'TABL
LD C,(HL)
INC H
LD B,(HL)
LD A,L
ADD A,E
JP NC,MUL4
JP M,MUL1
SUB E
SUB E
JP NC,MUL2
LD A,E
SUB L
MUL2 LD L,E
LD D,(HL)
DEC H
LD E,(HL)
LD L,A
LD A,(HL)
INC H
LD H,(HL)
LD L,A
SBC HL,BC
OR A
SBC HL,DE
LD A,L
CPL
LD L,A
LD A,H
CPL
LD H,A
INC HL
SRL H
RR L
RET
MUL4 LD L,E
LD D,(HL)
DEC H
LD E,(HL)
LD L,A
LD A,(HL)
INC H
LD H,(HL)
LD L,A
SBC HL,BC
OR A
SBC HL,DE
SRL H
RR L
RET
MUL1 LD A,L
SUB E
JP NC,MUL3
LD A,E
SUB L
MUL3 LD L,E
LD D,(HL)
DEC H
LD E,(HL)
LD L,A
LD A,(HL)
INC H
LD H,(HL)
LD L,A
SBC HL,BC
OR A
SBC HL,DE
SRL H
RR L
LD A,L
CPL
LD L,A
LD A,H
CPL
LD H,A
INC HL
RET

As a result, the number of cycles is a minimum of 141 and a maximum of 207, which, you must admit, is a bit. This algorithm was borrowed from the game BATTLE COMMAND 128. Moreover, this was mentioned in the electronic magazine "Spectrum Expert 1." However, a slightly different program was provided there, which took into account the signs during multiplication, and the numbers were limited from -128 to +127, which is not always convenient.

Now let's talk about calculating the square root. In ZX FORMATE 7, there was some program for calculating the root, but I will say directly that it was not very good. The problem is speed. I tried to calculate the square root of 1024 using that program. I traced it with STS, and it turned out that the program does division 5 (five) times, and when calculating the square root of 65535, it does 10 (ten!) divisions. Moreover, the division procedure takes about 1000 cycles or more. I then thought a bit and wrote my own. Its listing is given below. My program performs only one division if the number falls within the range from 2 to 16383, and performs two divisions if the number is in the range from 16384 to 65535. For numbers 0 and 1, no divisions are performed. The program works according to the modified Newton's algorithm that I created. Try to figure out how everything works by yourself. I will give you a hint: the table contains so-called average roots of numbers from 2^1 to 2^15. They are calculated using the formula:

(sqr(2^n)+sqr(2^(n+1)))/2

SQRT LD A,H ;HL = SQR (HL)
OR L
RET Z
LD D,H
LD E,L
LD B,15
ADD HL,HL
JR C,SQRT2
DEC B
ADD HL,HL
JR C,SQRT2
DEC B
SQRTL ADD HL,HL
JR C,SQRT1
DJNZ SQRTL
LD HL,1
RET
SQRT1 LD HL,TABL
LD C,B
LD B,0
ADD HL,BC
LD C,(HL)
SQRT_ PUSH BC
CALL DIV
POP BC
ADD HL,BC
SRL H
RR L
RET
SQRT2 PUSH DE
CALL SQRT1
POP DE
LD B,H
LD C,L
JR SQRT_

DIV PUSH BC ;HL = DE/BC with rounding
LD A,B ;answer
CPL
LD B,A
LD A,C
CPL
LD C,A
INC BC
LD HL,0
LD A,E
ADD A,A
RL D
DUP 16
ADC HL,HL
ADD HL,BC
JR C,$+4
SBC HL,BC
RLA
RL D
EDUP
LD E,A
POP BC
SRL B
RR C
OR A
SBC HL,BC
EX DE,HL
RET C
INC HL
RET
TABL DEFB 1,2,2,3,5,7,10,14,19,27
DEFB 39,55,77,109,155,219

The execution time I did not count, but it is approximately 1500 - 2500 cycles. The program can be sped up if a faster division procedure is used. But it is possible to do without divisions at all if another algorithm is used. I have one in mind, but so far I do not have enough free time to figure it out and write the program :-(

That's where my narrative ends. I will only say that the listings used ALASM assembler directives. For example:

DUP 16
program body
EDUP

means that the program body is assembled 16 times.

Finally, I want to inform you that I want to prepare a long article on 3D graphics on SPECCY for the 6th issue. It will cover: formulas for rotating points in space, the algorithm for drawing filled triangles, decomposition, the algorithm for hiding invisible faces, the algorithm for constructing complex non-convex objects, algorithms for filling faces depending on the position of the light source, and something else. Naturally, examples in assembler will be provided. It would take desire and free time. That's all for now. BYE.

Contents of the publication: Deja Vu #05

  • Аперативчик - Max
    Detailed instructions on managing the DEJA VU interface, highlighting different input methods and navigation commands. Explanation of the new and old interfaces for enhanced user experience. Discussion on additional features like frame scrolling and music management.
  • Аперативчик - Max
    Discussion on supporting machines with more than 128k memory, leading to separate shells for 128k and 256k systems. Testing was mainly done on Scorpion and Profi, with functionality on other models anticipated. Article includes guidance on unpacking source files and insights on using improved algorithms.
  • Тема - M.M.A
    This article explores the theory behind digitizing sound on ZX Spectrum, focusing on sampling and quantization processes. It provides practical insights into converting sound files using specific hardware and software. Additionally, it offers methods to enhance sound quality while working within the hardware limitations.
  • Theme
    The article discusses the Save Our Scene initiative aimed at uniting Spectrum users and developers to promote software distribution and enhance the scene's development.
  • Charter of the Amazing Soft Making Association
    Discussion of the founding charter of the Amazing Soft Making association, detailing its goals, membership criteria, and operational principles.
  • Theory of Magazine Creation
    The article provides a detailed guide for aspiring magazine creators, focusing on technical aspects such as interface design, memory management, text formatting, and music integration for ZX Spectrum publications.
  • Solder Drop
    The article provides a personal account of purchasing and using the General Sound device for ZX Spectrum, detailing installation and sound performance. It discusses the initial issues encountered and praises the enhanced audio experience in compatible games. The author encourages further software adaptation for the device and reflects on multimedia capabilities with simultaneous hardware use.
  • Solder Drop
    The article discusses the capabilities of Sound Forge 4.0c for professional audio processing on PCs, highlighting its extensive features such as sound editing, effects, and restoration tools.
  • SOFTWARE
    The article reviews the latest software developments for the ZX Spectrum from Samara, including updates to MAXSOFT SCREEN PACKER, File Commander, and new applications like S-Terminal.
  • SOFTWARE - Card!nal
    Review and walkthrough of the logical graphic adventure game 'Operation R.R.' with detailed level instructions. Discussion on game elements like music choice and graphic design. Mentions new coder MAX/CYBERAX/BINARY DIMENSION's involvement.
  • SOFTWARE
    Discussion on the current state and evolution of the demoscene, highlighting the rise of 4K intros and upcoming competitions like FUNTOP'98.
  • CODING
    Article discusses assembly language coding techniques for optimizing screen scrolling on ZX Spectrum, featuring example code and performance analysis.
  • CODING - RLA
    The article explores stack manipulation techniques during second type interrupts for graphical effects on ZX Spectrum. It discusses solutions for preserving data integrity when interrupts disrupt graphical operations. Practical examples are provided to handle stack issues efficiently.
  • CODING
    The article describes the MS-PACK packer and its DEPACKER, detailing usage scenarios and providing BASIC and assembly code examples for handling packed files. It emphasizes optimizing performance by allowing unpacking with interrupts enabled and separating the DEPACKER from packed files. Additionally, it includes insights on programming techniques for loading and executing BASIC files on ZX Spectrum.
  • CODING
    The article discusses various coding techniques for ZX Spectrum, focusing on sprite rendering, rotation algorithms, and optimization methods to enhance performance.
  • ANOTHER WORLD
    Discussion on the evolution of multimedia technologies and their impact on various fields, including education and entertainment. It covers advances in computer hardware and software that have facilitated the integration of audio, video, and text. The article reflects on past developments and speculates on the future of multimedia systems.
  • ANOTHER WORLD
    Comparison of PC and Amiga systems highlighting performance, software costs, and user experience with multimedia capabilities.
  • Honor Roll
    Interview with PROGRESS discusses their creative journey on ZX Spectrum and AMIGA, addressing challenges in demomaking and the current state of the scene.
  • Honor Roll
    The article details the activities and future projects of the Eternity Industry team, based in Kovrov, including successful releases and collaborations with other groups.
  • Honor Roll
    Discussion of the Artcomp'98 festival, focusing on its mail-in format and guidelines for various competitions, including demo, graphics, and music categories.
  • Honor Roll
    The article provides a glossary of terms used in the demo scene, explaining roles such as musician, coder, and graphician, as well as different types of demos and effects. It serves as a useful resource for understanding the terminology and dynamics of the community. This is a descriptive piece aimed at educating readers about the jargon of the demo scene.
  • Honor Roll
    The article discusses the issues with mouse support in various ZX Spectrum magazines and the frustrations of users when encountering compatibility problems. It critiques developers for not adhering to standards, leading to poor user experiences. The author expresses the importance of consistent improvements in software for the ZX Spectrum community.
  • Honor Board
    The article discusses the process of creating tricolor images for ZX Spectrum using Photoshop and a simplified approach. It outlines how to divide an image into RGB channels and convert them for use on the Spectrum. Additionally, it provides tips on how to manage the files for optimal results.
  • Honor Roll
    The article discusses the comparison and perspectives on various computer systems, particularly emphasizing the strengths of AMIGA over PC and advocating for appreciation of all machines.
  • Seven and a Half
    Article discusses the humorous absurdities and peculiarities of military training and academia, blending satire with real anecdotes and witty observations.
  • Seven and a Half
    The article provides a satirical manual on programming methodologies, mocking the rigidity of formal programming practices and advocating for a more creative approach to coding.
  • Seven and a Half
    Instructions on safe sex practices, including guidelines on eligibility, preparation, and actions during and after the sexual session, along with handling emergency situations.
  • Seven and a Half
    The article discusses a call for a talented artist in Krasnodar for a ZX Spectrum group, raises concerns about the unethical practices of Scorpion regarding software rights, and critiques a video review of E'97.
  • Seven and a Half
    The article 'Семь и 1/2' narrates a humorous picnic adventure involving the editorial team of Deja Vu, highlighting their camaraderie and mishaps while preparing a barbecue.
  • Trial of the Pen
    The article is a humorous take on the fictional adventures of Winnie the Pooh as he interacts with computers and friends, discussing the absurdities of technology and daily life.
  • First Pen
    The article discusses the new section in Deja Vu dedicated to fantasy and science fiction literature, featuring book reviews and reader participation in content creation.
  • Advertisement
    The article is an advertisement section from Deja Vu #05, promoting collaborations with designers and musicians for future issues, and offering various software and hardware for ZX Spectrum.
  • News
    The article announces the launch of a new magazine, AMIGA RULES, focused on the AMIGA computer, addressing the lack of quality Russian-language publications. It aims to provide information on programming, hardware, software, and gaming, while fostering a community among AMIGA enthusiasts. The magazine will include contributions from readers and regular updates on the AMIGA scene.