March 2015 – Equals Zero

Let it be known! I shall not update my blog with the Motorama event report until all other MIT-based Motorama competitors update theirs. If you see the Part II, it means I’ve finally been satisfied… or got sick of waiting. I am a man of unyielding principle. For some background on this, through a spectrum of high-pressure sales tactics and peer pressure, an astounding number of new builders are attending; by my estimates, we had 6 totally new entries (and attendant totally new builders). One group elected to complete in the Antweight (1lb) tournament that runs on Friday, something that even I don’t normally do!

As I mentioned last time, none of the bots were in bad shape pre-Motorama, but I’ve been expending immense energy trying to get RageBridge v2 working in time. So why not start with it?

RageBridge v2

Previously in the adventures of RageBridge, I had gotten the bulk of the automatic input-reading organized through a depth of register-diddling greater than what I’ve ever dared do before. If you know me and how I program embedded systems, you know that registers are my most favorite things about the process, and there’s nothing I look forward to more when starting a new microcontroller project than hunting through the thousand-page programming manual and trying to write bit manipulation expressions.

…

Now, with the retching out of the way, time to try writing an input-taking routine.

One programming challenge for RageBridge v2 was the automatic input-mode detection. I don’t have a rail of jumpers to select the operating mode, and don’t intend on having one since the board is already SLAM-PACKED WITH RAGE!™ If I can get a reliable automatic input detection, then great! If not, then different firmwares will be made available.

The inputs to decide between are R/C, Serial packets, and analog. On power-up, Rage will take a second to let the inputs stabilize and any partner electronics like receivers and master microcontrollers boot up. Then it will, ever second, check for…

Serial buffer byte presence. It will read the first few bytes of the serial buffer and check for 64 or 192 (out of 255, an 8-bit value). These are “idle” commands for left and right motors. This convention is common to several robot controllers that have “simple serial” protocols, where 1 – 127 controls one motor and 128-255 controls the other. If these bytes are present, then assume Serial operation.
If there are no valid serial bytes, it will set up the channels 1 and 2 input pins for analog voltage reading. Additionally, it checks if the MIX jumper is set. MIX assumes “joystick mode” where the sticks are centered at 2.5v, so both inputs have to be at this level (within a small deadband) to select that mode. Else, if MIX is not in, it assumes 0-5v with 1-4v as the active range, meaning the signal must be approximately 1V (within a small deadband) to select.
If the analog voltages aren’t within the correct band (or it’s pegged to 5v due to the internal pullups) it will initialize the R/C inputs and wait to collect input pulses. Once valid input pulses for both Channels 1 and 2 are received – which must be 1500us +/- a deadband – then the mode is selected. The forced neutral-only mode selection acts as a zero-power-on-startup check too.

This pattern repeats once per second, and the motors will not drive until the appropriate signals are received. Overall, the code looks something like…

I had thought the Analog part would be the most difficult, with the deadband requirement and all. But in fact it was quite easy. With a few test potentiometers and scooter throttle handles, I could get it to reliably pop into Mix or non-mix mode every time. R/C mode proved to be the issue – no matter how much delaying or waiting I had it do, across different receivers even, I couldn’t get R/C mode to work!

As a bit of background, I use the PinchangeInt library to capture R/C pulsewidth signals without having to use the blocking (prevents rest of program from doing anything) pulseIn() function in Arduiono. After writing up a quick R/C only input-taking test program, I noticed something peculiar. All of the inputs read, no matter one channel or the other, simply looked like they were overflowing the variable over and over:

The PinChangeInt interrupt service function takes the difference between the micros() calls of two functions, a rising edge and a falling edge, to get the delta-time. micros() is an unsigned long (32 bit) type, and my R/C pulsewidth variables are signed integers (to be used later in mapping/scaling math). The pattern of values said to me that something was just counting further and further up – not taking the difference at all.

Checking with an oscilloscope confirmed that only one interrupt was being serviced:

I set and cleared a pin on the chip whenever the rising edge or the falling edge interrupt was being triggered. As expected, only one edge appeared. With this test, however, I couldn’t tell if it was the rising or falling edge…

…especially when it seemed that SOME TIMES, both interrupts were working! This shot clearly showed a 1000us pulse width, with two triggers. So what the hell was going on?

By the way, this is the first official usage of the DSO Nano 4 channel digital scope I picked up in Shenzhen! All of the initial debugging shenanigans happened at my desk as a result.

I reached out to the Arduino Forum, where the developer of PinChangeInt had a thread about the library, and learned some interesting facts… namely, that the code should never have worked the way I wrote it in RageBridge 1, and a couple of projects before that.

Well, hmmm.

To clarify without having to read the thread, here’s the difference. I’ve written all my R/C input code in the form of two separate interrupts, RISING and FALLING edge, since 2011 when I first started using the library. The form is similar to this, which was an excerpt from TinyCopter:

The takeaway is that there are two interrupts per pin. Allegedly, this should never have worked!

Yet, compiled in the latest version of Arduino, with the latest version of PinChangeInt, this is what happened. The same kind of nonsense I saw in my initial testing. It seemed to me that only the interrupt I defined last was executing, which contradicted what I saw on the oscilloscope – that some times both interrupt service routines did execute. But I freely admit that this could have been a oscilloscope artifact.

Pictured above is the structure that SHOULD HAVE been used this whole time – a single interrupt service routine, executed whenever the pin CHANGE’d states, and which subsequently read the state of the pin:

As the collected 1500us (approximate) pulsewidth shows, this works fine.

So what gives? Not even the developer knows, apparently! My “two interrupt” form was never supposed to work. I can only assume that between Arduino version 1.0.1 from 2013 andthe current 1.6, and from the first version of the PinChangeInt library I used to the current, that enough changed such that whatever loophole I used was no longer possible.

Either way, with the solution found, I backtracked a little and modified my “two interrupt” program to show which interrupt “won”, so to speak.

I changed the two-interrupt type to only increment or decrement a variable depending on which component was called. As I suspected, the last ISR to be declared always “won”. Here, vol_ch1_pw is being sampled by the main program loop, which shows it always decreasing. if I swapped the declaration order, it would keep increasing.

Whatever. Chalk it up to falling victim to the stochastic nature of Arduino developers, and y’all can lecture me about this is why you always use the programming manual yadda yadda I-don’t-care. It works now:

The PINC & [byte gibberish] portions of each ISR simply “select” the pin to read. Notice how it starts at 0b00000100 – this is because the first two bits represent the first two pins of port C, PC0 and PC1, which are reserved for the current sensors. PC2, PC3, and PC4 are the three channels of input.

With that mystery resolved, the entire automatic mode detection code worked instantly:

Shown above is Analog (unmixed, with a scooter throttle) and R/C mode inputs. Serial inputs are hard to capture on video since it’s one microcontroller talking to another. I assigned different blink patterns for the different modes to help during debugging, but they will be retained for actual operation.

The blink routine itself is interesting by itself. It’s the first time I’ve used an otherwise unused hardware timer (Timer 2 in this case) exclusively as a blinking machine. Here’s the code snippet that makes it work:

The software “prescaler” portion was carried over from my previous different motors controllers’ code, where it was part of the main loop. Either way, something executes that if() and builds up the prescaler variable. After enough times of this happening, it executes the blink part of the code. Timer 2 was set up during the initialization as a simple counter with a hardware prescaler of 64 from the system clock of 16MHz. This resulted in a counting speed of 250KHz – every 4 microseconds, the count would increment. Left to its own devices, this count would overflow after 256 counts, or 1.024 microseconds. To obtain a clean (to us humans) number, we start the count at 6 (BLINK_INITIAL_VALUE) such that it only has to count 250 times per overflow, giving a clean 1 millisecond timer. With some frills, that’s basically the millis() function re-implemented. I could have included this in the main loop as a call to millis() and it probably would have worked just fine, but I wanted it on a separate interrupt to not hog main program loop time.

Next up is that horrific line of bit manipulation that somehow worked the first time. What this line does is turn PB5 – a.k.a Arduino pin 13, the LED, on and off according to the state of a byte that is being used purely as an 8-bit array, where a 1 means “LED on” and 0 means off. current_blink_pattern is this byte, and it’s assigned whenever the mode changes so the blinking pattern changes.

The line first takes PORTB (the output status of the pins) and masks off PB5. Next, it grabs a copy of current_blink_pattern and shifts it right by some number of bits – basically an array counter that is reset at the bottom every time it exceeds 7. It logical ANDs this with 0x01 to isolate the least sigificant bit (rightmost bit). Finally, it moves this bit leftwards 5 positions to line up with PB5, and jams the two together.

The result is that PB5 is always turned off every cycle but the result of (current_blink_pattern >> blink_index) & 0x01 turns it back on (or leaves it off for this cycle).

With BLINK_INTERVAL set to 100 and the interrupt happening every 1 ms, the result is a 0.1 second blink duration where the LED can on or off. Now, strictly speaking, with 8 bits I can only make 0.8 second-long blink patterns. That’s fine, I don’t care.

Yes, I get all the way to the end of this timing problem and don’t even care that I get clean 1-second long patterns.

One thing that all the scoping to get to the bottom of the interrupt issue was discovering that the input filter capacitance I put on the input pins, which are shared by both analog and digital signalling, was too great.

Ouch, that’s a 10us rise time there. The fall time is about twice as fast, but still bad. I put 4.7nf input caps on the pins, which I guess is too much for another little microcontroller’s pins to drive. I might have to compromise analog signal stability (or implement software filtering at some point) in favor of making the edges sharper.

Once the bizarre library behavior was accounted for, and the automagic mode selector finished, the rest of the code was actually quite easy. In concept, it was just like RageBridge 1, but better implemented.

First, the inputs are sampled and cleaned, with erroneous signals replaced with the “last known good” one and a signal-bad flag set. Next the signal is mapped to an output variable. The output variable is bounded by a maximum increment per loop (ramped), and the result deciphered into PWM + direction and output to the drivers.

Whereas in RageBridge 1 all of this took place messily within 1 loop, I organized it all into procedures so the different modes can be better isolated, even thought hey still acted on global variables:

Of course, right now, I only have the R/C code written.

I did away with the incredible machine that was the mapWithRampingAndDeadband() function of RB1. That was basically broken up to its constituents – the deadband was taken in the input processing stage, and the ramping output mapping just a compare and += or -=.

Soon, it was time to test. Without having made compatible heat sinks yet, I, uhh, created an innovative board clamping system to secure it to the old style Ragebridge 1 heat sinks.

Innovative. That’s the word.

It’s totally not “two binder clips and some electrical tape”.

The test subject was Clocker. By this point, it was Wednesday afternoon, so this was going to be a determinator as to whether or not I try running the new boards at Motorama or not.

During a few on-the-bench tests of pegging the throttle back and forth, I was able to get the board to reset several times. This is not a good sign, as it shows the design has noise coupling into it in some place that is making it all the way to the logic. I have a few suspects, namely the long analog power and ground traces feeding the current sensors and the less than optimal placement of the logic power supply taps from the main board power planes.

My first battle tactic is always to insert capacitance where it doesn’t belong to see where the noise was coming from. For starters, all the 0.1uF logic circuit buffering capacitors were changed to 1uf. Now, strictly speaking, I suppose it was capacitance that was there in RageBridge v1 but was removed, since all of those on version 1 were 1uf to begin with. I also changed the primary input bypass capacitor (on the Vcc pins of the micro) from 1uFto a 10uF.

For the most part, as far as I could tell, this resolved the issue. But capacitors are band-aids to the real problem, which is layout. I’m going to have to take a closer look at how the logic power is feeding in and out of the big patches of battery voltage and ground planes, all being switched hard tens of thousands of times a second.

There was also one particular pin connection on the A3941 gate drivers which I was unclear about, which may also contribute to noise coupling via ground fluctuation, but I will address that once I put together some more boards. In the mean time, I taped the board back into Clocker for some hard driving.

For the most part, the test went quite well…

This test revealed a very important hidden flaw in the drivetrain that I’m glad was caught. Namely, when I last took apart the DeWuts on Clocker to change the motors, I neglected to recheck the torque-limiting clutch screws. The clicking, clutch-slipping noise in the video is exactly that. A little bit of hex key wiggling – no sprocket removal even needed – and things were tightened up again.

During this and subsequent bench tests, I couldn’t get it to lock up or reset no matter how much I gunned the motors around. In fact, I did it so hard that…

What you see is a thin pall of DeWalt smoke from some rapid input reversal testing gone wrong. It seems that one of the motors had a borderline commutator short, which quickly became non borderline when rapidly smashed with +/- 26 volts.

The resulting quick short also took out one gate drive chip and two FETs on the half of the controller driving that motor. I replaced them both in 5 minutes, along with sending this motor to the pile of sad motors.

I ended up deciding not to run RageBridge v2 (board revision 1) in Clocker for this event, since I didn’t want to have to do ESC debugging in the field while my match is 2 minutes away. What I did do in the remaining day was add in the much-requested combine mode, which hard-parallels the two channels together at the hardware level. This enables Rage to function as a single large 1 channel ESC.

That part was easy. In combine mode, only channel #1’s current sensor is read and the assumed current is just twice that. The current control loop output calculated for channel 1 is applied directly to channel 2, making the assumption that the two halves, driven off the same PWM timer with the same gate driver, will switch on (within a small fraction of the total switching time) with eachother. This is not the most robust method, as it totally slaves channel 2 to channel 1’s mercy, and a failure on channel B will probably cause the whole thing to grenade. Maybe it’ll get revisited shortly.

Why did I put Combine mode together? Because another bot this time needed it…. and for that, we’ll need to wait for Part 2.

Equals Zero

robots are dumb

Month: March 2015

Motorama 2015 Part II: The Pitfalls of Used Überclocker Ownership and Unmodeled Dynamics of Candy Paint & Gold Teeth

Motorama 2015, Part I: RageBridge v2 Drama

RageBridge v2