Bagi anda yang menggunakan Cisco router, mungkin suatu saat anda akan mengalami hal ini atau anda pernah mengalaminya. 3 hari yang lalu saya mengalami hal tersebut. Tiba-tiba salah satu router kami mengalami masalah, ping putus2 sekalipun ping ke ip loopback, SSH juga susah sekali masuk setelah mencoba beberapa kali akhirnya muncul halaman login, tetapi authentikasi gagal karena koneksi ke TACACS server juga ikut terganggu. Cukup merepotkan juga apalagi saya hanya diberi waktu 30 menit untuk melakukan analisa dan mencari solusinya. Sedikit informasi saya menggunakan router Cisco 7600.
Pada menit pertama saya berhasil mengidentifikasi permasalahannya, yaitu Utilisasi CPU yang terlalu tinggi mencapai 99%”. Hal ini saya ketahui melalui monitoring tool yang saya gunakan, jadi saya gak perlu repot2 login ke router karena dengan kondisi seperti sekarang sangat sulit untuk bisa masuk ke router. Setelah tau penyebabnya maka sekarang saya harus mencari penyebab “High CPU Utilization”. Berikut hal-hal yang dapat anda lakukan apabila mengalami problem yang sama.
Identifikasi Masalah
SHOW PROCESS CPU NOTIFICATIONS (if any)
Router#show process cpu sorted
Dengan command ini anda akan mendapatkan informasi tetang CPU utilisasi, process-process yang menggunakan CPU, dan interrupt percentages. Informasi tersebut bisa anda dapatkan pada baris pertama dari output show.
router#sh proc cpu sort
CPU utilization for five sec: 99%/54%; one minute: 99%; five minutes: 99%
PID ntime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
77 11697800 822858 14216 38.52% 37.51% 37.71% 0 IP Input
128 702920 714062 984 2.47% 2.57% 2.50% 0 Tag Input
Keterangan:
CPU utilization for five seconds: x%/y%; one minute: a%; five minutes: b%
Total CPU Utilization: x%
Process Utilization: (x - y)%
Interrupt Utilization: y%
Process Utilization is the difference between the Total and Interrupt (x and y). The one and five minute utilizations are exponentially decayed averages (rather than an arithmetic average), therefore recent values have more influence on the
calculated average.
SHOW LOG
Lakukan “show logging” untuk mendapatkan informasi log router, barangkali ada informasi berguna yang dapat membantu untuk melakukan identifikasi masalah dan menemukan akar permasalahannya.
Dampak dari High CPU Utilization
– Input queue drops
– Slow performance
– Slow response int Telnet or unable to Telnet router
– Slow response on the console
– Slow or no response to ping
– Router doesn’t send routing updates
Kemungkinan Penyebab High CPU Utilization
Berdasarkan hasil “show process cpu” saya mendapatkan informasi bahwa IP Input menggunakan resource CPU cukup besar. Maka ada kemungkinan penyebabnya adalah seputar IP Input. Tetapi meskipun demikian, saya tidak menutup mata akan adanya penyebab yang lain. Hal-hal yang mungkin menyebabkan High CPU Utilization antara lain:
Hardware failure
Perkiraan saya untuk hardware failure yang dapat menyebabkan High CPU Utilization adalah RSP failure dan/atau VIP failure.
Configuration
Berdasarkan hasil “show process cpu” kemungkinan permasalahan berhubungan dengan IP Input, jadi bisa saja karena konfigurasi fast switching, TCP intercept, penggunaan IP NAT, dll. Atau bisa juga karena ada yang melakukan DoS.
Langkah-langkah Penanganan
Hardware failure
a. RSP Failure
Jika yang bermasalah RSP ya diganti aja :D. Tetapi sekedar saran mungkin opsi ini dijadkan pilihan terakhir saja, karena untuk menggangti RSP router harus dimatikan. Total estimasi downtime yang dibutuhkan kurang lebih 15 menit. Waktu yang dibutuhkan router 7600 untuk booting kira2 10-15, itu termasuk update routing, dll.
b. VIP Failure
Untuk memastikan apakah yang bermasalah VIP atau bukan, maka anda bisa melakukan:
Router#sh controllers vip all proc cpu | i utilization
CPU utilization for five seconds: 1%/1%; one minute: 1%; five minutes: 1%
CPU utilization for five seconds: 5%/5%; one minute: 5%; five minutes: 5%
CPU utilization for five seconds: 21%/21%; one minute: 21%; five minutes: 21%
CPU utilization for five seconds: 19%/19%; one minute: 20%; five minutes: 19%
Jika anda ingin melihat detil prosesnya maka lakukan perintah “sh controllers vip vip_number proc cpu “.
Configuration
TRY THIS: If IP Input is consuming the CPU, one of the following might be the cause:
– Fast switching is disabled on an interface (or interfaces) that has a lot of outgoing traffic. Examine the output of the ‘show interfaces switching’ command to see which interface is burdened with traffic. Re-enable fast switching on that interface.
– TCP Intercept is enabled. TCP Intercept requires process switching for all packets during session set-up.
– Fast switching is disabled on an interface which supports more than one network and is routing traffic between them. This can occur when an interface has one or more secondary network addresses configured.
INFO: The router will process switch all packets sourced from the interface and destined to host(s) off the same interface which is a CPU-intensive task. Use the ‘ip route-cache same-interface’ interface configuration command to allow packets to be fast switched on the same interface.
– Traffic that can’t be fast switched is arriving. This could be any of the following types of traffic:
* Packet for which there is no entry yet in the switching cache.
INFO: If there is a device in the network which is generating lots of packets at an extremely high rate for devices reachable through the router and is using different source or destination ip addresses, there won’t be a match for these packets in the switching cache, so they will be processed by the IP Input process. This source device can be a malfunctioning device or a device attempting a Denial-of-Service (DOS) attack.
* Packets destined for the router (ie. Routing Updates or a Spoof Attack)
* IP packets with options
* Packets that require protocol translation
* Multilink PPP
* Packets that require policy routing.
INFO: IOS versions 11.3 and higher allow policy-routed packets to be fast switched. Usee the ‘ip route-cache policy’ interface configuration command to allow policy-routed packets to be fast switched.
* Packets going through serial interfaces with X.25 encapsulation. In the X.25 protocol suite, flow control is implemented in layer 2 of the OSI model.
* Compressed traffic. If there’s no Compression Service Adapter (CSA) in the router, compressed packets must be process-switched.
* Encrypted traffic. If there’s no Encryption Service Adapter (ESA) in the router, encrypted packets must be process-switched.
– A lot of packets, arriving at an extremely high rate, for a destination in a directly attached subnet, for which there is no entry in the ARP table. This shouldn’t happen with TCP traffic, because of the windowing mechanism, but it can happen with UDP traffic.
– A lot of multicast traffic going through the router. Unfortunately, there’s no easy way to examine the amount of multicast traffic. If you’ve configured multicast routing on the router, you can enable fast switching of multicast packets using the ‘ip mroute-cache’ interface configuration command (fast switching of multicast packets is off by default).
– A lot of broadcast traffic. Check the number of broadcast packets in the ‘show interfaces’ command output.
– Too much traffic is passing through the router. If the router is over-used and is incapable of handling this amount of traffic, try distributing the load among other routers or consider purchasing a high-end router.
– IP NAT is configured on the router and there are lots of DNS packets going through the router. UDP or TCP packets with source and/or destination port 53 (DNS) are always punted to process level by NAT.
– Check who’s logged on to the router and what they are doing. If someone is logged on and is issuing commands that produce long output, the high CPU utilization by the IP input process will be followed by a much higher CPU utilization by the virtual EXEC process.
– Make sure all debugging commands in your router are turned off by issuing the undebug all or no debug all command.
– Check for a possible security issue. Commonly, high CPU utilization is caused by a security issue, such as a worm or virus operating in your network. Usually, a configuration change, such as adding additional lines to your access lists can mitigate the effects of this problem. Check the Cisco Product Security Advisories and Notices for information on the most likely causes and specific workarounds.