THE WAKE UP CHANNEL
Go to bottomPage: 12345
TOPIC:
*
#573
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 0
OK Alex. Those questions are there for a reason - each one of those have been problems that show up in real, wild environments. And these are types of things we've seen that make RTOS/ TCP-IP stacks stumble. A lot. And not just Zilog stacks, but we've seen it with SafeRTOS, embedded FreeBSD/Linux-type stacks as well.

I took a look at your website, and your code file is downloaded - but at this point we have to stick with what we have and keep maintaining it - it is far too expensive in engineering time to switch over and start from scratch, but its good that other people reading this thread know about it.

Also, just for your feedback, and this is just a friendly observation: If I had to walk into a board meeting and explain to the directors about a new RTOS we wanted to test, and the website I was showing them has a bunch of photo-shopped copyrighted images for your "Contact Us" page - they would laugh me out of the meeting. I know its in good humor, but that page wouldn't help me convince my client that its something they should spend money on testing. It would also help if you showed specs showing where your stack is faster or more reliable, etc. This is just feedback, it doesn't reflect on the quality of your RTOS code.

Also, you might consider asking the moderators here if you can post results of how well your RTOS stacks up (pardon) against the Zilog offerings?? Can you tell visitors to your website at a glance (without downloading code) if your RTOS+TCP offers snmp, arp, dhcp, snmp, ppp, smtp, pop, ftp, ssl, or xxxx protocols? How fast is it?? Etc. I think Zilog would want to sell chips, and if you have a better mousetrap that gives people a choice, that might be a good thing! I'm sure Zilog would be more than happy to show another RTOS running well on their MCU.
John Anderson (User)
Junior Boarder
Posts: 37
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#574
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 1
Comparison between HACK-RTOS and RZK wouldn't be a correct one. They are much too different. HACK-RTOS is written purely in Assembler; RZK mainly in C. HACK-RTOS allows efficient use of 64k of RAM only (ADL-0 mode), RZK - 16M (ADL-1 mode). HACK-RTOS is faster and more real-time, but at a cost. It is better in some applications, worst in others.

I know the look of my site is not appropriate for a respectful RTOS supplier. That's because I'm not the one. No matter how my site looks like, the truth wouldn't be convincing for directors eager to minimize risks and maximize profits. I have not been aiming that high!
Alex K (User)
Fresh Boarder
Posts: 19
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#579
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 1
John, can you formulate a procedure for me, easiest possible, to make the Zilog TCP stack to stumble a lot?

Suppose I have a TCP server written with RZK & ZTP. There is no http or ftp, just plain TCP server that returns back to the client everything it gets. What could be done to make it fail in, say, 8 hours time? Also, please, define "stumble a lot". I can think of a situation where even the best stack would stumble, if 90% of the TCP segments can't come through, for example.
Alex K (User)
Fresh Boarder
Posts: 19
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#580
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 0
Actually precise details of our testing procedure are part of the IP my client owns and has paid a lot of money for, so I can't give you a "step by step".

However, it a nutshell, one test we do is to setup say between 50 and 220 units loaded with what we think is a good set of firmware on a hubbed (not switched) network and start sending in a lot of snmp traffic. One of the mistakes end-users make is to setup their snmp monitoring to send GET's far too often, for instance if a remote network node was blocked, and the snmp GET's are stored on the sending PC in a pool to resend. When that remote network node is re-enabled our units get hit with a huge flood of snmp requests all at once. That's a fairly recent discovery.

Testing on many different routers brands is always a challenge also. Cisco's "Store and Forward" system is sometimes presents a challenge to some stacks, especially when we're doing remote firmware upgrades to the Zilog MCU.

The system must work with any goofy MTU size that might be on the network. Users might want an MTU of 10 bytes to 9k/15k superpackets. We've seen other stacks go down when trying to negotiate what MTU is going to be used. So that is one thing you want to test.

Some stacks go down when VLAN packets hit and they aren't expecting them, or when they are pinged with an IPv6. Or when big ARP tables are generated.

One other test that seems to be a challenge is to just have the systems running on a busy enterprise or public network, and can it run 24/7. Before releasing anything into the wild we try to see say 10 or 25 or 50 units run like this for at least 3 to 6 weeks, and be used by many different users in a lot of different setups simultaneously. The term "A Lot" means that the units under test will have a uptime of say a week or less before they hang-up. The newer ZTP 2.xx package will last a couple of days before it hangs - my client has no interest in troubleshooting the new stack for Zilog, so they've dropped it completely. This is running Zilog's own demo code. So that is why we choose to maintain the older ZTP 1.34 package.

When we see a failure, we have to see if its something we did in our firmware running under the ZTP stack, or is it the ZTP stack itself. Once we kind of narrow the problem down we'll try to run some isolated code to see exactly what the problem is. It takes a while, and sometimes its a matter of adjusting a thread priority here or there, etc. If its in our code we fix ourselves, if we find something on Zilog's side we let Zilog know.

Since for use the systems that fail will be almost always be on a unmanned oil platform 350 miles offshore and accessible only by a $$$ per minute helicopter flight. Or the board will be 500 feet up on an antenna tower, and the tower climber tech charges by the foot. Replacing the board in the field can be very, very expensive, and we try to design the system so that a power on-reset can happen just once, even through a firmware upgrade. I.e our systems are designed so that a backup MCU takes over while the main WEb GUI MCU (Zilog) is getting the firmware upgrade.

In short, you want to test your system like it won't ever be physically accessible again, and a" Power On Reset" is NOT an option. And don't ever over-estimate the level of incompetence of the end-user - try to make it as bulletproof as possible. Even that won't be enough but that's what you try for.
John Anderson (User)
Junior Boarder
Posts: 37
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#585
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 1
Thanks for the suggestions, John. I'm not sure about 50 - 220 units on a hubbed network. I'm testing my one unit on an isolated network consisting of around 10 hosts.

Though I haven't been able to send IPv6 pings to the MCU (it appears that NDP protocol encapsulates its messages in something other than RFC-894 Ethernet), I'm pretty sure they can do no damage to the stack. The first check there is to compare the Type field of the Ethernet header with 0800 (IP datagram) or 0806 (ARP request). If it is something different, the frame gets deleted.

About VLAN packets, I just don't know where can I get some, but again, I can see the place where they are supposed to be filtered away. No way they could pass that place, unless the Zilog documentation is misleading on the subject.

When you speak of MTU of 10 bytes, I presume you really mean MSS of 10 bytes (TCP Maximum Segment Size). That's that is involved in negotiations when a TCP connection is established. Different MSS have never been a problem. I usually use small ones like 191 bytes. That way it is possible to make about 10 TCP clients to send & recv data to the MCU continuously and simultaneously without hindering each over. I was unable to achieve something comparative from ZTP. With ZTP, when I were to add third TCP client (my TCP_Client.exe), transmissions were becoming jerky because of packet losses and subsequent retransmissions.
I'm not sure about 9k/15k MSS. I thought that MTU of Ethernet is 1500 bytes, and MSS may not exceed it.

Your testing environment looks very tempting to me. It's a pity I don't have access to something like that.
Alex K (User)
Fresh Boarder
Posts: 19
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#586
Re:Any way to get sources for ZTP 1.34? 2 Years, 4 Months ago Karma: 0
That's MTU, not MSS, Alex. Over RF-based Ethernet links you'll have MTU's of almost any size. 10 or 64 or 100 bytes can happen also on a noisy RF link. On the GigE or fiber net's you'll run into jumbo packets (usually around 9k) or up to 64k (Super jumbo packets). Sometimes we have the usual 1500 or 1522 bytes (VLan) but the stack has to play nice no matter what MTU is in use - or trying to be used. Its not that the Zilog has to handle a 64k packet for comms, but it has to not get stuck if the equipment its connected to is trying to do that for some reason. We've seen that happen.

There really isn't any PC-based "load-stress tester" app that we've found that can simulate a real-world wild network. Those programs are a start (and help when you're trying to zero in one one specific problem) but they aren't an absolute substitute for reality.

Why do we test for at least 3~4 weeks?: Because beside solid operation on the 'Net we're also looking for any seconds rollover problem in the time keeping mechanisms. We've seen problems with stack - and our own software - and with GPS receivers - that fail after 24.85 days of uptime. That's because a 32 bit signed long counter, updated every 1mSec (very, very common), will fail exactly at that time if it doesn't handle the rollover correctly. An unsigned long counter will fail after 49.7 days if the system isn't designed for that. Even Trimble has this problem on some of their newer GPS receivers, which they weren't even aware of until we started doing extended testing last year - on the 25th day, the GPS receivers on all of our units would develop all sorts of random features just like they had a bad memory pointer. And that's exactly what the problem was.

So, Alex, you can't really test a stack for reliable operation in 8 hrs. It takes much, much longer than that to really look for those "gotcha" problems. You can look at the code and swear nothing could possibly break it, and then you put in on a wild network and be humbled every time.
John Anderson (User)
Junior Boarder
Posts: 37
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
Go to topPage: 12345