our network comprises mainly of LANs housed within two separate builidngs linked together by fiber optic cables.
recently a power failure happened and from then we started experiencing network latency issues wherein it involves building to building connectivity. however network inside each building is fine (meaning no network latency issues) but when it involves network within the two buildings, network latency arises (ex. transferring of files from one building to the other). i am suspecting that probably the main fiber connecting the two buildings might be causing the dilemma, however it may also be possible that either a switch or a bad nic is causing the same problem as well. that is why i want to make sure first that no other peripherals within our side is causing the latency so that i may have our provider check the fiber connecivty.
in line with this may i ask the gurus to provide me pointers on how to go about addressing the issue. any means or tools that would help in troubleshooting would be greatly appreciated.
First you'll probably want to check buffer and error counts on the backbone (fiber) interfaces, if these are under your administrative control. Also, if there are multiple routed hops between end systems, one in each building, try a traceroute to pinpoint the bottleneck. It's difficult to provide much more help without a more detailed description of the network, though.
thanks for the reply. actually our setup is pretty simple. two buildings are connected by fiber and are terminated at each end by a media converter and are plugged into switches. it just serves as an extension of one building's LAN to the other building, also, there are no additional hops as it is already part of the LAN.
when pinging a host on the other builidng, results show a time of <1ms but gives out a single RTO in a few minutes then repeats the cycle. it has somewhat like a pattern involve in giving out RTOs.
would ethereal show me where the latency resides? i am trying to capture packets but am dumbfounded since i am unable understand some of its output..
Hmm. You had a power outage then when it all came back you discovered this problem that wasn't there before. And you feel there might be a pattern to the RTOs...
This isn't scientific, just gut feeling, but what happens sometimes when you get a power outage is that servers and machines come back up in a different state. Is it possible that there is some regular, scheduled thing that used to run on your network and then was stopped (perhaps ages ago) and now, following a reboot, it's running again? Perhaps a monitoring job or a copy job or something?
If it's not anything like that then I'm with l0gic - check for errors on the switch ports that lead to your media converters
what TheBishop has said is not totally baseless. most scheduling programs run on the principle of TSR, and are activated from time to time based on timers or requirements. but i have seen many such schedules running intermittantly or sometimes just stopping and not working at all for some reason. in that case, the network resources are freed up and it gives you more performance.
now when all your servers come back to life after a power surge, its like all the dormant services have started to run again. the situation is like a power reset on the server to make a fresh start. that now might gobble up your network bandwidth.
you should check each and every point of interchange between your optical backbone and your resident network in the buildings. check the RTT for packets between the buildings, find out where the bottleneck is if there is any, also try and find out if there is any packet loss. setup ethereal on either end and check traffic. go through the capture files to ascertain if there are too many retransmission in the TCP layer, which is a sure sign of packet loss, and TCP is trying to do damage control by requesting retransmission. this is one way of creating bottlenecks in your network, as too many retransmissions clog up the channels. also check routers at either ends of the optcial backbone, and check their settings. make sure they are not rejecting packets beyond a certain size, or blocking traffic due to some firewall setting on it.
whatever you find out to be the cause of this latency, please let us know as well. it serves as a learning experience for us