CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to support@ccsinfo.com

Somewhat broad question: What causes PIC lockups?

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
Jeff7



Joined: 22 Sep 2009
Posts: 13

View user's profile Send private message

Somewhat broad question: What causes PIC lockups?
PostPosted: Thu Oct 08, 2009 9:03 pm     Reply with quote

I mentioned some lockups in this thread about some I²C issues, which have been resolved.

Now, I'm not sure how much code I'd be allowed to post, as it is work-related. So forgive me if I go through some convoluted explanations. I'll of course answer questions to the best of my ability.

I'll give some background first though:
- Chip is a PIC18F4520 @ 20MHz, running at 3.3V.
- CCS compiler, v4.062
- The PIC is connected to a DS3231 realtime clock via the hardware I²C pins. Transfer speed has no effect on lockups. 40k, 100k, 400k, doesn't matter.
- The PIC is connected to an XBee wireless module using the hardware UART pins, at 9600bps, which transmits about 15 bytes every 1.5 seconds.
- The pic is connected to a GPS module via pin C1, for a software UART, at 38400bps. Pin C1 is checked every 5 seconds using timed_getc, looking for the GPRMC string. If a valid time is retrieved, the RTC gets updated. The GPS module is configured to send only GPGSV and GPRMC strings.
- Watchdog timer is enabled and active.
- Stackoverflow/underflow reset is enabled.
- There had previously been a large function to hold all menus, and within that function, some other functions were called. One of these functions contained a one-dimensional array of size 212, as well as a 2D array of size 4x70.
- If I omit this function from the #include list (and therefore don't call it), I don't get lockups. HOWEVER, if I do include it, the lockup still occurs even if the function never gets called. Simply including it seems to cause problems.


The problem: The PIC chip locks up solid after awhile of runtime.. For some time, it would lock up after 10-60 minutes or so. I eventually moved the large-array functions outside of the Menu function, which seemed to cure the problem....nope - it just caused the lockup to take about 2 days to manifest itself.



Info about the lockup:
- The LCD (based on the HD44780 controller) will show corruption - every few characters go blank, some random ASCII characters show up in some places, and the rest of it simply shows the last data received.

- The connection between the GPS module and the PIC goes to 1.8V. The GPS module runs on 3.3V, so that line normally is either at 3.3V or 0V when sending data. Why it goes to 1.8V during these lockups is anyone's guess. I am also not certain if this is a cause of the lockup or a symptom. It is not a component fault though, as this problem occurs on multiple boards.

- While locked, the PIC does not respond to interrupts on the PORTB pins (provided by some pushbuttons on the board.

- The lockups occur at random times, so I don't know what the trigger would be.



Summary, and the exceedingly-broad question: I'll be working with MPLab and a RealIce ICD sometime in the next few days, but I'm rather new at it. What sort of thing am I looking for? Stack problems? Memory corruption? Gremlins? Should I come back after I've spent some quality time with the RealIce? Smile


So much interesting stuff to learn...and all the expensive, fun toys are at work. Sad


Most recent update: I happened upon this old post here, and noticed I didn't post what the problem genuinely was. After all the investigations of memory overflows and so on, I finally happened to have a RealICE connected to the darn thing when it locked up. Turns out, it wasn't locked, but rather stuck in a loop, and a bad one to get stuck in.
Code:
void lcd_send_byte( BYTE address, BYTE n ) {

      lcd.rs = 0;
      while ( bit_test(lcd_read_byte(),7) ) ;
      lcd.rs = address;
      delay_cycles(1);
      lcd.rw = 0;
      delay_cycles(1);
      lcd.enable = 0;
      lcd_send_nibble(n >> 4);
      lcd_send_nibble(n & 0xf);
}

BYTE lcd_read_byte() {
      BYTE low,high;
      set_tris_lcd(LCD_READ);
      lcd.rw = 1;
      delay_cycles(1);
      lcd.enable = 1;
      delay_cycles(1);
      high = lcd.data;
      lcd.enable = 0;
      delay_cycles(1);
      lcd.enable = 1;
      delay_us(1);
      low = lcd.data;
      lcd.enable = 0;
      set_tris_lcd(LCD_WRITE);
      return( (high<<4) | low);
}


It would venture into lcd_send_byte, in order to output to the display. Interrupts were disabled during this time.
This would then go to while ( bit_test(lcd_read_byte(),7) ) ;. Once in lcd_read_byte, it went through, and hit delay_us(1), which also restarted the wdt. It would then return the data, and remain within that while loop continuously.

So, with interrupts disabled, the PIC would not respond to button presses (on the three EXT interrupts), and the display would not update - it appeared to be locked, but it was actually in a loop, constantly restarting the WDT.
I'm not sure why it gets stuck there though; seems that the LCD was indicating to the PIC that it was still busy with.....something.

My change:
Code:

void lcd_send_byte( BYTE address, BYTE n ) {

      int stuck_counter = 255;
      lcd.rs = 0;

      while (bit_test(lcd_read_byte(),7) && stuck_counter>0 )
      {
         stuck_counter--;
      }
      lcd.rs = address;
      delay_cycles(1);
      lcd.rw = 0;
      delay_cycles(1);
      lcd.enable = 0;
      lcd_send_nibble(n >> 4);
      lcd_send_nibble(n & 0xf);
}


Just a simple counter of arbitrary size to kick it out of that loop. I'd imagine that stuck_counter could be considerably smaller, but without knowing what the LCD module is thinking, why it seems to be indicating that it's busy, well, I'll stick with 255. I had a little "catch" in there to indicate every time the LCD had gotten stuck - it had gotten stuck in there a few times over the course of running multiple months, but it kicked itself out each time without a visible hiccup.


Last edited by Jeff7 on Sun Apr 18, 2010 3:12 am; edited 2 times in total
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Thu Oct 08, 2009 9:21 pm     Reply with quote

Read these threads:
http://www.ccsinfo.com/forum/viewtopic.php?t=39439
http://www.ccsinfo.com/forum/viewtopic.php?t=39494
http://www.ccsinfo.com/forum/viewtopic.php?t=27638&start=4
Jeff7



Joined: 22 Sep 2009
Posts: 13

View user's profile Send private message

PostPosted: Thu Oct 08, 2009 9:27 pm     Reply with quote

Hm, I guess I was searching for the wrong keywords. Embarassed

Reading....


Edit: Alright, after a fair amount of reading, I thought I'd found one problem: The PIC was running at 3.3V, as I mentioned, but when I read the datasheet, I somehow found the graph of speeds for the 18LF4520, which indicated that 20MHz @ 3.3V would be fine. Turns out the 18F4520, which I have, is rated for a minimum of 4.2V.
So I upped the voltage to 5V and tested.......nope. Crash.
Twas time to bust out the Real Ice.

Eventually, I found a problem, and it was where I suspected: In the checksum routines.


Previously, when the PIC read in just the GPRMC strings (70 characters long), it would then move through the data string until it reached a *, indicating that the next 2 characters would be the checksum, or else until it hit a numeric limit on the packet size.

However, I also wanted it to handle GPGSV strings as well, but they're only 68 characters long (at most), and the array was sized accordingly.

The function and array were defined like this:
Function: int1 packet_checksum(int *verifyee);
Array: int comms_sat[3][69] = {0,0};
Call: packet_checksum(comms_sat[counter]); where counter is 0-2.

Snippet of code used to get the checksum, as well as the limits on the WHILE loop:
Code:
while(verifyee[csumcount]!= '*' && csumcount<71)
   {
      checksum ^= verifyee[csumcount];
      csumcount++;
   }
csumcount++;


The problem would come in when the * wasn't present due to data corruption. It wound up trying to read beyond the upper limit of a pointer to an array.
A comment on this page mentioned "Even reading a bad pointer may cause your program to crash in the future," and then linked to this page which further detailed that, including this rather familiar quote:
Quote:
In both cases, the program keeps on running, and then that memory corruption manifests itself as an "impossible" crash two hours later.


Another page said it would cause undefined program behavior, which could result in darn near anything happening.


Quick and dirty solution: I just bumped up the size of the 2D array.
Better solution: Use a const variable to assign the size of the arrays, and pass that value to packet_checksum();


So, I'm still running the test on it; it's run through the night thus far without a lockup, and I'm cautiously optimistic.


Update, October 16: It's been running non-stop for several days now without any problems.
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group