Wednesday, July 30, 2014

Retro Pinball: Reverse Engineering the Bally AS-2518-51 Sound Module Assembly

One of my friends has a beautiful 1980 Nitro Ground Shaker pinball machine. It's in great condition, but there is absolutely no sound output. As I was trying to figure out a way to troubleshoot the sound module, I found myself falling deeper and deeper into the 8-bit pinball rabbit hole. In this post I will try and explain the inner workings of the amazing Bally AS-2518-51 Sound Module Assembly.


The 2518-51 Sound Module is based on the famous Motorola 6800 series microprocessor (U3 in the picture and in the schematic). Sound synthesis is performed by a General Instrument AY-3-8910 (U1). There's also a 6820 Peripheral Interface Adapter (PIA) (U2) to interface the AY38910 to the MPU, a ROM (U4) that contains game-specific code and data and a RAM chip (U10). The microprocessor is clocked by a 3.58 MHz crystal. Due to the built-in clock divider the 6800 operates at 895 kHz.

The Bally AS-2518-51 Sound Module Assembly as found in Nitro Ground Shaker
The pinball system interfaces with the sound module using a 15 pin connector (J1) that provides both power and input. The only output of the sound module happens through the speaker connector (J2).

The manual contains schematics for all the electronics in the pinball machine. The schematic for the sound assembly module is reproduced below.

Schematic of the AS-2518-51 Sound Module
The sound module's basic operation is as follows. The pinball's main processing board (known as the MPU Module Assembly) asserts the number of the sound it wants to play on the sound module's address lines on the J1 connector (pins 1, 2, 3, 4, 12 and optionally 13). Then it pulses the sound interrupt line on pin 8. The 6800 picks up this signal (through the PIA) and jumps into action. It reads the number of the sound to play (from the IO port on the AY-3-8910, through the PIA) and generates commands for the AY-3-8910 (again, through the PIA) to synthesize the requested sound.

Reconstructing the Memory Map

The 6800 accesses its on-bus peripherals (RAM, ROM and PIA) by mapping pieces of its 64 kiB address space to each individual peripheral. This is all done in hardware by wiring some of the lines of the address bus to pins of the peripheral chips that cause the chip to turn on or off. Pins that cause a chip to turn on are usually called CS (for chip select). Pins that cause a chip to turn off are usually called /CS (the negation of CS). A chip is enabled if all CS lines are pulled high and all /CS lines are pulled low.
All chips on the bus are only allowed to be active when the address bus contains a valid address. When this is the case the 6800 pulls the VMA line high (for Valid Memory Address). This line is usually connected to a CS pin, fulfilling this requirement.

RAM chip connections
The RAM (U10) chip's data lines and address lines A0 through A6 are connected to the data and the address bus of the 6800 respectively. Two address lines (A7 and A12) are also wired to /CS pins. This means that the RAM chip is only active if the A7 and A12 lines are both low.

PIA chip connections
The PIA (U2) is active when A7 is high (CS) and A12 is low (/CS). Also note that the chip is only wired to address lines A0 and A1.
ROM chip connections

The ROM chip is enabled when NAND(VMA, A12) is low (/CS). This is the case when A12 (and VMA) are high. The ROM chip is connected to the address bus through lines A0 to A10. Line A11 can be enabled with a jumper when the ROM chip is 4 kiB instead of 2kiB, but this is not the case for Nitro Ground Shaker.

Based on the above information we can reconstruct the memory map as seen by the 6800.

StartEndA7A12DeviceAddress LinesActual Range
0x00000x007F00RAMA0 - A60x0000 - 0x007F
0x00800x0FFF10PIAA0 - A10x0080 - 0x0083
1ROMA0 - A100x1000 - 0x17FF

Disassembling the ROM contents

ROM images can be obtained from the Internet Pinball Database for use with the PinMAME emulator. We are interested in the roms for the regular (6-digit) version of the pinball machine. The ROM bundle consists of four files. Three are for the MPU Module Assembly, and one is for U4 on the Sound Module Assembly. We are interested in the latter.

The game configuration in the PinMAME source code contains the following lines:

BY35_ROMSTART888(ngndshkr,"776-17_1.716",CRC(f2d44235) SHA1(282106767b5ec5180fa8e7eb2eb5b4766849c920),
                          "776-11_2.716",CRC(b0396b55) SHA1(2d10c4af7ecfa23b64ffb640111b582f44256fd5),
                          "720-35_6.716",CRC(78d6d289) SHA1(47c3005790119294309f12ea68b7e573f360b9ef))
BY51_SOUNDROM8(           "776-15_4.716",CRC(63c80c52) SHA1(3350919fce237b308b8f960948f70d01d312e9c0))

The ROM image for U4 (SOUNDROM) is contained in the file 776-15_4.716.

We disassemble the ROM contents using DASMx, a two-pass disassembler that supports the 6800 family of processors. When disassembling we must make sure to specify the correct origin address to ensure that the ROM contents line up with the addresses as seen by the processor. Since the ROM is mapped at address 0x1000 we must specify this when we invoke the disassembler:

dasmx -c6800 -o0x1000 -w ngndshkr\776-15_4.716

The result is a listing of data and instructions for the addresses in ROM.

Reverse Engineering the Boot Procedure

According to the datasheet of the 6800 family of processors, when the 6800 is started it will start execution at the address located at 0xFFFF and 0xFFFE. Other memory locations stored at the top of memory are used to react to other events:

Table 1 of the datasheet of the 6800 as published by Hitachi.
These addressess are mapped to addresses 0x1F8 and up on the actual ROM, since only address lines A0-A10 are used.

The ROM contains the following values:
17F8 : 10     
17F9 : 23 
17FA : 10     
17FB : 00     
17FC : 10     
17FD : 00     
17FE : 10     
17FF : 00

On (re)start, software interrupt or non-maskable interrupt (NMI, pin 6 connected to pushbutton SW1), execution starts at address 0x1000, which is the first address in ROM. On interrupt request (IRQ, pin 4 connected to the PIA) execution starts at address 0x1023.

First we initialize the stack. The stack pointer is set to 0x007F. The stack grows downward (toward lower addresses) and 0x007F is the highest address available in RAM.

1000 : 8E 00 7F    "   " [3] lds #$007F

Next up is basic PIA configuration. Refer tot the datasheet for the PIA for the meaning of the various registers.
1003 : 4F    "O" [2] clra
1004 : 97 83    "  " [4] staa X0083

Set PIA control register B (CRB) to zero. This is mainly to set bit CRB-2 to zero to allow us to write to the direction register using address 0x0082.

1006 : 43    "C" [2] coma
1007 : 97 82    "  " [4] staa X0082

Write 0xFF to the direction register for port B. This configures all pins of port B (PB) on the PIA as outputs.
1009 : 86 34    " 4" [2] ldaa #$34
100B : 97 83    "  " [4] staa X0083

Write 0x34 (00110100) to the control register for port B (CRB). This means:

 CRB-0/1:  Disable \IRQB by CB1
 CRB-2: Read/writes on 0x82 will now affect I/O port B. This enables output to the PB pins of the PIA.
 CRB-3/5:  (110) CB2 is output, CB2 set high. CB2 switches the analog output.

100D : 86 07    "  " [2] ldaa #$07
100F : 97 81    "  " [4] staa X0081

Set PIA Control Register A (CRA) to 0x07 (00000111):

CRA-0/1: Enable \IRQA by CA1, Set low-to-high transition. This is the main interrupt: when the pinball wants to play a sound it will trigger this interrupt. The PIA will then in turn lower the \IRQ line of the 6800 and an interrupt request will take place.
CRA-2: Select peripheral A register. Read/writes to 0x80 will now affect I/O port A. This enables us to read/write data from the AY-3-8910.
CRA-3/5: (000) CA2 setup does not matter as that pin is not connected.

1011 : 7F 00 3F    "  ?" [6] clr X003F

Clear memory location 0x003F. As of yet unexplained.

Then we launch a busy loop, presumably with the intention of giving the pinball machine as a whole a chance to come online. Interrupts request presented during this time as not handled, as interrupts are still masked.
1014 : 86 32    " 2" [2] ldaa #$32
1016 L1016:
1016 : CE 3D 2D    " =-" [3] ldx #$3D2D
1019 L1019:
1019 : 09    " " [4] dex
101A : 26 FD    "& " [4] bne L1019
101C : 4A    "J" [2] deca
101D : 26 F7    "& " [4] bne L1016

The inner loop at L1019 is executed 0x3D2D (15661) times and it consumes 8 cycles (see the timing information between square brackets), for a total of 125288 cycles per inner loop, or about 139 ms. This loop is executed 50 (0x32) times, yielding a delay of about 7 seconds.

101F : 96 80     "  " [3] ldaa X0080

Reading from 0x0080 gets us the value currently on port A of the PIA. This is the value presented to it by the AY-3-8910.

1021 : 0E     " " [2] cli
1022 : 3E     ">" [9] wai

Now we're ready to work. We unmask the interrupt requests and wait for an interrupt.

Playing a Sound: Overview

When the pinball machine wants to play a sound, it triggers the IRQ line on the 6800 through the interrupt logic of the PIA. This triggers the execution of the interrupt service routine (ISR) located at address 0x1023. The ISR performs some initialization and then reads the value asserted by the pinball machine on the AY-3-8910's IO port A to determine the sound that is being requested. Sounds are played with interrupts unmasked, which means that an incoming interrupt request can interrupt the currently playing sound. Since the ISR resets the stack at every invocation, the newer sound completely overrides the older one. An interesting note is that (at least on Nitro Ground Shaker) sound 3 is always played with interrupts masked, rendering it uninterruptible. Sound requests will be ignored while sound 3 is playing.

Sounds 0 through 4 are played using specialized subroutines that issue commands to the AY-3-8910 programmable sound generator using native 6800 code. Sounds 5 through 32 on the other hand rely on a custom virtual machine (VM) implementation that executes a domain specific bytecode.
Overview of sound playing logic

The bytecode can contain arithmetic operations on the AY-3-8910's registers, can perform both conditional and unconditional branches and even jump to, and return from, subroutines. It has a small stack and an explicit program counter. The sound VM is therefore a complete virtual computer optimized for the task of driving the sound generator.

Virtual Machine Bytecode

The VM code takes the form of  simple bytecode instructions where the high nibble of the first byte contains the instruction opcode to be executed. Execution is handled on a per-opcode basis using a table of pointers to opcode handler subroutines. The first parameter (places in the low nibble) is passed to the handler routine in accumulator A. Accumulator B contains the byte that follows the opcode as the second parameter. Whether this is actually used by the instruction is determined by the opcode handler, as it determines the instruction length by increasing the VM program counter VM_PC. By indexing relative to VM_PC and incrementing it accordingly afterward, the VM can execute instructions of arbitrary length. We have observed instructions of length 3.


By disassembling the opcode handlers we can understand the purpose of the different opcodes. All opcodes 0-F are in use, but I have not yet figured out the purpose and usage of some of them. I assigned the mnemonics based on my understanding of the instructions. The mnemonics used in the original documentation/source code are likely very different. Also, this table reflects my current understanding of the bytecode and the bytecode interpreter and it is likely to contain major mistakes and omissions.

Store VAL in AY-3-8910 register REG

REG += 1

REG -= 1
Branch to (relative) offset
Branch if REG is zero
Branch if REG is not zero
Pause. Use DELAY if A!=0, otherwise use default delay.
91 ?REG

A3CALL0?OFFS_HIGHOFFS_LOWCall bytecode (A=0) or native (A!=0) subroutine.

Return from bytecode subroutine
Write value in RAM to REG. Purpose unclear.
Read value in REG to RAM. Purpose unclear.

Freeze execution of sound (until next interrupt) (?)

End execution of sound. (?)


The  Bally AS-2518-51 Sound Module Assembly is an extremely interesting piece of kit. In this post we have described the general functionality of the Sound Module Assembly, we have reconstructed the memory map as seen by the 6800 processor, we have disassembled the firmware and taken a first stab at reverse engineering the the sound playing logic. 

The sounds are generated, either directly by the 6800 microprocessor or using a custom virtual machine programmed in a specially designed bytecode. Because of this, it is far from trivial to extract the sound effects directly from the ROM images into a description of the AY-3-8910 register states that can be dumped into an YM file or similar. It is much easier to instrument a full emulator (like PinMAME) if this is the goal.

It should however be possible to craft a program that "compiles" a YM file into either a native or a bytecode routine so that it can be added to a pinball machine as one of the sounds available. Whether it can be done within the hardware constraints and whether it can be cleanly patched into the firmware remains to be seen.

Wednesday, August 8, 2012

Complex CPU Addressing modes, for free!

One of the most powerful features of a central processing units is the ability to access memory using complex indexing modes. These modes are essential to implement compound data structures such as arrays, structures and classes in an efficient manner. Our FPGA soft-processor core will need to do all these things too.
The instruction set architecture (ISA) that we have developed in the previous post support only basic register indirection, like this:

MOV [R1], R2
MOV R1,[R2]

where the [R1] denotes the memory address pointed to by the register R1.

However, as we shall see, we can support rather complex indexing modes essentially for free!
Recall the layout of our execution unit:

Not that the memory address register (MAR) can only be loaded using bus3, which means taking a trip through the ALU. We can therefore load MAR with the result of an ALU operation at no additional cost. This gives us e.g. the following indirect addressing options:

MOV [R1 + R2], R3
MOV R1, [R2 - R3]

The additional information needed in the instruction is minimal, we just need to specify an ALU operation and the bus2 driver just like we do for regular ALU operations.

Note that the above also holds true for reading/writing the memory block register (MBR) itself! We can only write to the MBR by going through the ALU. Also, when we want to store the contents of MBR in a register, we have to go through the ALU too! This gives us the opportunity to replace the stores with ALU operations, yielding instructions of the form

MOV [R1 + R2] - R3, R4
MOV R1 + R2, [R3 - R4]

To achieve this, we will need to store an additional ALU operation and bus2 driver in the instruction. Fortunately, we have room to spare in these instructions! Below we see the instruction words for load and store operations as they are currently in use.

load0010unused[src reg]unuseddst reg
store0011unusedsrc reg[dst reg]unused

To support the new indexing modes, we need to modify the instruction words to contain two complete ALU specification instead of one. We shall call them the "top alu" (suffix T) and "bottom alu" (suffix B) configuration, and they both consist of an ALU operation (OP, 4 bits) and the driver specifiers for bus1 (B1, 4 bits) and bus2 (B2, 4 bits).

opcodetop alubottom aludst reg
load0010OPTB1TB2TOPBunusedB2Bdst reg

The load operation always uses the contents of the addressed memory location as input for bus1 during bottom alu, leaving B1B unused. The store operation stores the result into the addressed memory location, leaving the destination register unused.

The code than handles the instruction in the control unit has not become significantly more complex, even through we need to juggle a few more parameters:

when X"2" =>;
        -- Indirect register load
        case PHASE is
                when X"0" =>;
                        -- Tranfer bottom ALU operation into MAR        
                        CONTR <= X"0000" & IR(15 downto 4) & X"C";      
                        PHASE <= unsigned(phase) + 1;
                when others =>
                        -- Transfer MBR + top ALU into output register
                        CONTR <= X"0000" & IR(27 downto 24) & X"D" & IR(19 downto 16) & IR(3 downto 0);
                        -- end of instruction, load the next instruction
                        PHASE <= (others => '0');
                        PC <= unsigned(PC) + 1; 
                end case;
when X"3" =>;
        -- Indirect register store
        -- (Store the contents of rN at the address in rM)
        case PHASE is
                when X"0" =>
                        -- Tranfer bottom ALU operation into MAR        
                        CONTR <= X"0000" & IR(15 downto 4) & X"C";      
                        PHASE <= unsigned(phase) + 1;
                when others =>
                        -- Transfer top ALU operation into MBR
                        CONTR <= X"0000" & IR(27 downto 16) & X"D";                                                     
                        -- end of instruction, load the next instruction
                        PHASE <= (others => '0');
                        PC <= unsigned(PC) + 1; 
                end case;

As always, the complete, synthesizable VHDL for this project is available on bitbucket.

Of course, with all these funky addressing modes, assembling instructions by hand is starting to become quite a chore. What we need, is an assembler!

Tuesday, July 24, 2012


XENTRAL is a simple Harvard Architecture CPU. It targets the Spartan6 FPGA present on the Digilent's Nexys3 board, but it's all written using portable behavioral VHDL1. You should be able to use your favorite FPGA, synthesizer and simulator.

All project files are available on github under the Creative Commons Attribution Non-Commercial (CC-by-NC) license.

Architecture Overview

XENTRAL is a Harvard Architecture processor. This means that code and data are handled separately.
The picture above is a very high-level schematic representation of the XENTRAL architecture. Instructions are fetched from ROM3. Based on the instruction, the control unit manages the execution unit. The execution unit performs the actual execution of the instruction, by e.g. adding the contents of two CPU registers, writing a value in RAM or performing I/O.

XENTRAL is a Harvard Architecture processor. This means that code and data are handled separately. This effectively precludes the use of self-modifying code. It also prevents the processor from loading its own program. The program must be programmed into the instruction memory. This is a common approach for embedded devices and microcontrollers, but it does not play nice with operating systems and boot loaders. As the project grows, we will re-visit this limitation.

Overview of the Execution Unit

Schematic representation of the execution unit

Tuesday, May 22, 2012

LightDHT - A lightweight python implementation of the Bittorrent distributed hashtable

I have always been fascinated by distributed data structures, but my efforts have always been dampened by the fact that they are really awkward to play with. In order to get anything that resembles real-life usage, you need to be running hundreds of nodes. Needless to say, wrangling hundreds of processes gets rather cumbersome. Fortunately, there are already thousands of people out there that are participating in one of the largest distributed data structures in operation today: the bittorrent distributed hashtable.
In order to facilitate experimentation with this wonderful system, I have written a simple python implementation of a DHT node. It is mostly (with a few exceptions -- see below) compliant with BEP0005. It should be able to act as a full member of the DHT, processing requests without disrupting the network.
Interested in playing around? Get the code on github!
From the documentation:
The aim of LightDHT is to provide a simple, flexible implementation of the Bittorrent DHT for use in research applications. If you want to trade files, you have come to the wrong place. LightDHT does not implement the actual file transfer parts of the bittorrent protocol. It only takes part in the DHT. 
The main philosophy of LightDHT comes down to two considerations: ease of use over performance and adaptability over scalability. 
In order to keep LightDHT easy to use, all DHT RPC calls are performed synchronously. This means that when you call a DHT method, your program will block until you have an answer to your request. That answer will be the return value of the function. This has the advantage that it keeps the logical program flow intact, and makes it more comfortable to use. 
In order to maintain O(log N) scaling across the network, BEP0005 (the standard governing the DHT) mandates that implementations use a bucket-based approach to the routing table. This enables the node to fulfill all requests in constant time and (more or less) constant memory. In LightDHT, we throw that recommendation to the wind. 
Since the main focus of LightDHT is reseach, we are going to keep around all the data we can. This means that we keep around every single node we know about. Since in practice the number of nodes is limited and the request rates are rather low, we do not bother keeping the routing table organized in a tree structure for quick lookups. Instead we keep it in a dictionary and sort on-demand. The performance penalty is well worth the reduced complexity of the implementation, and the flexibility of having all nodes in an easy to use data structure.

Saturday, December 10, 2011

Festive and Nerdy

You might have seen those big outdoor displays where every pixel is composed out of a little three-color LED light. In this post I'll tell you how I turned a string of those pixel lights into Christmas tree lights!

Adafruit industries, one of my favorite suppliers, recently started selling strings of a particularly advanced model of these pixel lights. Have a look at the product image, and tell me if it doesn't already look like a string a Christmas tree lights..
"12mm Diffused Digital RGB LED Pixels"
As you might have seen in my previous post, these pixels are rather easy to control from a micro-controller. However, stuffing a breadboard in your tree is not exactly festive. Thankfully, you can get these handy little tins that are just about the right size for a wall-wart plug, a teensy 2.0 with female headers (courtesy of Pieter Floris) and a big mess of wires.

A wall-wart plug, a Teensy2.0 and a big mess of wires.
Note that, thanks to the female-pin Teensy, everything is wired up using jumper wires. That's right - this is a no-solder project. Because the underside of the Teensy is not isolated, and because I have a number of wires in there that might come loose, I have isolated the interior of the tin using electrical tape. This is most likely not compliant with any code enacted after 1953 or so, so any imitation is purely at your own risk.

The only remaining question is how to fasten the plugs and wires onto the enclosure. Ideally you would use specially molded plastic caps and bits, but the odds of finding them in the right size and shape at your local hardware store are quite slim. Instead, I liberally applied black sugru around the holes I had cut into the tin, and then pressed the wires and the plug into it. The sugru just molds around the objects. After a quick touch-up I left it to cure overnight, and the result was really really good.

Now we just pop closed the tin and voila, a presentable programmable Christmas tree lights driver!

The controller and a single string of LED pixels, running a test pattern.
And here's the tree, all festive and nerdy:

Festive and Nerdy

Sunday, December 4, 2011

It's the season! (preview)

I while ago my eye dropped on this string of individually addressable RGB LEDs, and I realized they are practically ready-made Christmas lights. A quick order, a long wait, and an evening with my Teensy++ later, I have a sneak preview of our new Christmas tree decoration!

A WS2801 LED string is practically already a Christmas tree decoration!
Since this creation will actually be used "in production" I need to spend some time making it "living-room acceptable". I can tell you this will involve some patience and a lot of Sugru!


Wednesday, November 23, 2011

PCB as a Capacitive Soil Moisture Sensor

Electronics enthusiasts have been designing ways of automating plant care for decades, with mixed results. A traditional weak point of these installations is the resisitive moisture sensor, that is inaccurate and prone to degradation. In this post we will design and fabricate an inexpensive capacitive soil moisture sensor out of a printed circuit board that exhibits none of the weaknesses of its resistive brethren.