Saturday, October 16, 2021

Storing data on a cassette using Arduino and Python (Differential Manchester encoding)

    I've been building a retro computer, and it's gotten me interested in using cassettes as data storage. This poses an interesting challenge where binary information has to be converted into something that can be written to, and reliably read from, a cassette. We have to worry about immunity to noise (tape hiss), speed fluctuations (wow/flutter), and amplitude fluctuations (dropout).

    Another limitation is frequency response. Our signal has to stay safely within the range of frequencies a tape can reproduce. This range can be as narrow as 400-4,000Hz for something like a microcassette. We could send a stream of bits at a safe 2kHz, but what if we then have a very long run of all zeros (or ones)? Our signal would dip below 400Hz, and our data would be lost. 

frequency response of my Pearlcorder L400 microcassette recorder


    One solution is to toggle our output at least once per bit. Two bits would give a full cycle and guarantee a minimum frequency of 1kHz. The presence of an additional toggle can represent a zero, and its absence a one. If every bit had an additional toggle, it would yield the maximum frequency of 2kHz. This is the basis of Differential Manchester encoding. 

Differential Manchester encoding - wikipedia


    Besides it fitting nicely in a frequency range, Manchester has other advantages. Cassette recorders rarely concern themselves with the polarity of their signal (since it doesn't affect the sound) and will sometimes invert their output relative to their input. Manchester encoding only uses the presence of these "toggles" or edges, and is unaffected by being inverted.
    Also, each bit spends an equal amount of time high and low. This means we have no DC offset. If the offset were irregular, our signal would drift up and down, making decoding more difficult.
    It's fairly resilient in the face of speed warbles too, as we have an octave separating our ones and zeros. In other words, zero is always twice as fast as one. For comparison, one early modem standard used 1300Hz and 1700Hz for one and zero respectively.

    So, Manchester encoding it is! This settles how we encode individual bits, but not how we structure our data. I chose to mimic the standard serial packet structure of "8n1". This means a zero starting bit, 8 data bits, no parity bit, and a one stop bit. This makes it easy to figure out exactly how the data is aligned when receiving.

8n1 - wikimedia commons

    I opted to add a calibration tone to the beginning of my files. This gives the receiver time to detect the amplitude, and more importantly, the frequency of the signal. This tone is simply a long string of ones. The starting bit (zero) of the first byte signifies the end of the tone.

    I've written a python script that will take a binary file and output a Manchester encoded audio file that can be recorded directly onto a cassette.

 Python encoder:
#Converts binary file to Differential Manchester encoded audio
# outputs 32kHz, 8bit, mono WAV. 8N1 format at 3200 baud
# includes calibration tone, and checksum. Zack Nelson 2021
import struct, os
from sys import argv

smplrate = 32000 #Hz
baud = 3200 #needs integer ratio between baud and sample rate 

#functions-------------------------------------------------
#each bit starts by inverting the output
#zeros will invert again in the middle
def out_bit(bit):
    global bit_status
    bit_status = not bit_status #toggle
    for x in range(2): #2 half-cycles
        for y in range(int(smplrate/baud/2)): #samples
            if bit_status: buf.append(0xD8) # hi
            else: buf.append(0x28) # lo
        #toggle if bit 0
        if x == 0 and not bit: bit_status = not bit_status

def out_byte(byte):
    out_bit(0) #start bit
    for i in range(8):
        out_bit(bool(byte & (1<<7)))
        byte <<= 1
    out_bit(1) #stop bit
#---------------------------------------------------------
    
try: len(argv[1]) #load arguments
except IndexError:
    print("Input file needed")
    exit(2)

fi = open(argv[1],'rb') #open input
fo = open(os.path.splitext(argv[1])[0]+".wav", 'wb+') #open output

file = bytearray(fi.read())
fi.close()
buf = []

bit_status = False
checksum = 0

for i in range(smplrate): buf.append(0x80) #silence
for i in range(256): out_bit(1) #calibration bits

for byte in file: #add all bytes
    checksum += byte
    out_byte(byte)

out_byte(checksum) #add checksum

for i in range(smplrate): buf.append(0x80)#silence

#write wave header to file
fo.write(str.encode("RIFF"))
fo.write((len(buf) + 36).to_bytes(4, byteorder='little')) #length in bytes
fo.write(str.encode("WAVEfmt "))
fo.write((16).to_bytes(4, byteorder='little')) #Length of format data
fo.write((1).to_bytes(2, byteorder='little')) #PCM
fo.write((1).to_bytes(2, byteorder='little')) #Number of chans
fo.write((smplrate).to_bytes(4, byteorder='little')) #Sample Rate
fo.write((smplrate).to_bytes(4, byteorder='little')) #Sample Rate * bits * chans / 8
fo.write((1).to_bytes(2, byteorder='little')) #8bit mono
fo.write((8).to_bytes(2, byteorder='little')) #Bits per sample
fo.write(str.encode("data"))
fo.write(len(buf).to_bytes(4, byteorder='little')) #length in bytes

fo.write(struct.pack('B'*len(buf), *buf)) #write audio to file
fo.close()

Hardware Interface


    Now that we can store data onto a tape, we need a way to read it back. First we'll focus on the hardware required to connect the cassette recorder to a computer. 

Tape to computer interface schematic


    To read from a tape, the audio is first highpassed. This reduces potential DC offset, and noise from motor rumble. The audio is then amplified, bringing the tape's ~1V line-level output closer to the 5V we want for the digital signal. The amplification stage also lowpasses the audio, reducing some hiss and noise outside of the range of our signal.
    Next the audio is passed through a schmitt trigger. This transforms the smooth audio to a rigid, digital signal by comparing it to two thresholds. If the audio signal goes above the high (2.6V) threshold, the output is a digital one. If it goes below the low threshold (1.5V), the output is a zero. If the signal hangs out between the two, the output does not change. This provides some noise immunity. As long as the noise doesn't swing enough to push the signal over the wrong threshold, it will simply be ignored.

Schmitt trigger input(U) and thresholds (A, B) - wikipedia


    Now we have a digital signal, but it's still Manchester encoded. I selected an Arduino to run a proof-of-concept decoding program. It takes in the digital signal from our interface board (via pin D2) and outputs the decoded bytes over serial. 
    To do this, it listens to part of the calibration tone, and calculates the signal's timing. It uses this timing to discern ones from zeros. Three edges close together count as a zero. Two edges far apart count as a one.
    When it detects the first zero (start bit), it begins constructing and transmitting bytes.
    It has the ability to detect and report framing errors (incorrect start/stop bit placement), and invalid edge patterns. It's unable to recover from these errors though. It would be possible to correct framing issues by buffering bits and searching for valid frames within the buffer.


Arduino Decoder:
// Differential Manchester decoder
// Zack Nelson
const byte pulsePin = 2; //interrupt input

int byte_count = 0; //count for printing newlines
uint32_t last_ts = 0; //timestamp of prev edge
byte edge_count = 0; //edges per bit
byte bit_count = 0;
byte rec_Byte = 0;

//calibration-----------------------------------
unsigned int hi_threshold = 0; //hi pulse in uS
unsigned int cal_count = 0;
unsigned int cal_ts = 0;
bool lead_in_done = 0;

void setup() {
  pinMode(pulsePin, INPUT);
  
  attachInterrupt(digitalPinToInterrupt(pulsePin), count, CHANGE );
  
  Serial.begin(230400);
  Serial.print("Start. ");
  
  while(!hi_threshold); //Calibration done--------------------------
  Serial.print("High threshold(us): ");
  Serial.println(hi_threshold);
}
  
void loop() { }

void count() { //gets called on every transition of data pin
  if (!hi_threshold){ //Calibration---------------------------------
    if (cal_count == 32) cal_ts = 0; //skip 0-31 readings
    cal_ts += (micros() - last_ts); //average 16 pulses
    if (++cal_count == 48) hi_threshold = cal_ts / 21; //calc 75%
  } else { //Receive data--------------------------------------------
    bool bit_val = ((micros() - last_ts) > hi_threshold); //hi or lo?
  
    //lead in check--------------------------------------------------
    if (!lead_in_done && !bit_val) lead_in_done = 1; //first zero

    if (++edge_count > 2) { //error
      Serial.println("Edge cnt err");
      edge_count = 1;
    }
    
    //low bit = 2 fast pulses, high = 1 slow pulse
    if ((!bit_val && edge_count == 2) || (bit_val && edge_count == 1)){
      if (lead_in_done) bitDone(bit_val); //add bit to byte
      edge_count = 0;
    }
  }
  
  last_ts = micros();
}

void bitDone(bool bit_val) {
  //start bit lo, 8 bits MSB first, stop bit hi
  if (bit_val) rec_Byte |= (0x80 >> (bit_count-1));

  if (bit_count == 0 && bit_val) Serial.println("Start err");
  else if (bit_count == 9 && !bit_val) Serial.println("Stop err");
  
  if (++bit_count == 10) { //complete byte?
    //Uncomment to print hex
    /*if (rec_Byte < 16) Serial.print(0); //leading zero
    Serial.print(rec_Byte, HEX);
    Serial.print(", ");
    if (++byte_count % 16 == 0) Serial.println(""); */
    Serial.print((char)rec_Byte); //print ASCII character
    
    bit_count = 0;
    rec_Byte = 0;
  }
}

    Files are available on my github page.

    Here are some images of my setup to read from a microcassette. I was able to use it to read data out at around 3000 baud.



20 comments:

  1. wow that's insane:D
    Have you tried to speed up the playback motor for higher baud? An STM32 could keep up with the speed.
    A stand alone uart datalogger would be awesome :D

    there is an interesting compression for dial up modem PCM:
    https://en.wikipedia.org/wiki/List_of_ITU-T_V-series_recommendations#Error_control_and_data_compression

    ReplyDelete
    Replies
    1. I didn't want to modify the deck, but it is running at the faster of its two speeds, "2.4ips".
      I am modifying an answering machine's transport though, so there might be an opportunity there.
      Ultimately I want a 6502 in the low MHz range to be able to decode this, so that puts a cap on things.

      Some very clever things were done for dial-up! They're a bit beyond me and my hardware though. Ha ha.

      Delete
  2. Find an ground loop isolator and place it between the ard and the cassette.

    ReplyDelete
  3. Sounds great! How much data are you able to store in a 30 minutes tape? Do you have an estimation?

    ReplyDelete
    Replies
    1. Thanks! At 3000 baud it works out to about 500KB on a 30 minute tape. I'm assuming the length is spec'd at 2.4ips.

      Delete
  4. Asynchronous serial uses start and stop bits because the sender and receiver clocks might differ, and because every byte can have a variable delay between each. There is no need for start and stop bits here because every bit is self-synchronizing. Thus start and stop bits are unnecessarily reduce bandwidth. The start of a stream can simply contain a series of zeroes with a 1 to start things, or something like this.

    ReplyDelete
    Replies
    1. That's all correct, except if there's an error. I have a different version of the Arduino code that can detect and recover from an error using the framing bits. If that's not needed, then yes, save the bits!

      Delete
    2. It seems like it would possibly be a better use of those 2 redundant bits per byte setting them to some sort of parity or checksum of the data. You could still resynchronize to the stream by dropping bits until the checksum matches again, and then you would also detect bit corruption. Searching a bit about the topic, it seems like you could go as far as implementing 4-to-5 encoding, where you use 5 bits over the wire to transmit 4 bits of data, and of the 32 possible codes, choose the 16 that have the most desirable transitions.

      By storing square waves, aren't there a bunch of harmonic frequencies that get attenuated and cause the played back signal to be distorted? I suppose it works out because you're using the lower end of the tape's response, and so you get enough of the signal back for the Schmitt trigger to recover.

      Delete
    3. 4b/5b is an interesting idea, but it works out to the same number of bits and adds complexity to the software side. Ultimately the save/load routines will be part of ROM for my 6502 system, and I want to use minimal space for them (no lookup table for bit encodings). I had looked at 8b/10b though.

      I like the parity bit idea! Maybe I'll keep the start bit, but replace the stop bit with parity. Then you could be quite confident you're lined up, and the data is valid.
      At the same time, I like the simplicity of how it is. I'm shooting for 80s hobbyist-grade tech. The PAiA 8700's tape format was my inspiration, and it uses start/stop no parity.

      Yes, the squarewaves are lowpassed by the recording/playback process. As long as the fundamental frequency is loud enough, it should be decodable though.

      Delete
  5. This is brilliant and I love that you did this. Not easy.

    ReplyDelete
  6. Have you considered encoding a stereo signal to increase the baud rate?

    ReplyDelete
    Replies
    1. My recorder, along with most microcassette recorders, is mono only. It could be an interesting project though.

      Delete
    2. You can try going for different encoding (manchester halves your available bandwidth (2 transitions per bit on average), going for something like 8b/10b encoding (https://en.wikipedia.org/wiki/8b/10b_encoding) can improve bandwidth (only 20% overhead, instead of 50%), at the cost of more CPU power required for decoding/encoding?

      Delete
    3. Well, differential Manchester uses an average of 1.5 transitions per bit, and a max of 2. I get your point though.
      You're giving up self-clocking, immunity from reverse-polarity, automatic DC balance, and easy decoding (like you say) for that gain in throughput.

      Interestingly, 8b/10b takes up more bandwidth (in the signal processing sense). Since it allows long runs of 1s or 0s, those result in lower frequencies. If we clocked it at my microcassette's max of 4kHz, a run of five 1s would be a half cycle at 800Hz, and be about 6dB down.

      Delete
  7. I am currently working on a project to allow for simulated digital music playback from audio cassette tapes and this seems like it might work better than the Minimodem program currently being used since the wow and flutter is an issue on some playback devices. Not to mention that we having a hard time getting Minimodem ported to the LyraT platform. Only thing I am not clear on is if the data can be streamed in realtime or does the entire data file needs to be read in then decoded?

    https://github.com/ns96/cassetteflowJava

    https://youtu.be/pvrnPoVNLMM

    ReplyDelete
    Replies
    1. Hi Nathan. That's a really fun project.

      Yes, Manchester encoding can be decoded in real time. That's how my Arduino program does it.
      You should be able to start decoding from an arbitrary point too. I'm using a leading calibration tone so that I know the exact speed and position of the data, but it's not necessary.
      You can figure those out as soon as you find valid framing bits. That should only take a handful of milliseconds. Then each bit can be decoded as soon as it's played in.

      Delete
  8. Cassette decks are intended to record audio so it is a good idea to give them a signal that looks like sounds. You should look into the way Tandy built their tape interface. They used a six bit DAC for output and a dead simple zero crossing detector for input. They recorded a 1000Hz or 2000Hz sinewave to represent zero or one giving an average bit rate of 1500bps. It was a dead simple circuit and very reliable in real world usage. Transmit software is trivial and receive not much more difficult.

    ReplyDelete
    Replies
    1. I'm not sure what it means for something to "look like sound". The signal is being bandpassed before going to the cassette, so it's not much of a square wave.
      As long as the spectrum and amplitude of the signal are right, what does it matter?

      Is there an upside to Tandy's approach? I'm getting around 3000bps without needing a DAC.

      Delete
  9. Zack- this is very cool.
    This is somewhat similar to a project that’s been bouncing around my head for a few years but I haven’t gotten passed a problem area. Is there any chance we could do a short zoom/FaceTime etc session ?

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete