Dears,
I have a problem in a BGP peering between an SRX220 and an MX10. The problem: at least once every two hours, the peer goes down, due a "Hold Time Expired Error". I dont know how to identify the event that cause this 'periodic' interruption; i don't see link flaps, neither packett loss.
I share the log messages and relevant configuration in both devices.
SRX Config:
set interfaces ge-0/0/0 description TRUNK set interfaces ge-0/0/0 unit 0 family ethernet-switching port-mode trunk set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members VLAN-551 set interfaces vlan unit 551 family inet address 10.0.5.11/29 set vlans VLAN-551 vlan-id 551 set vlans VLAN-551 l3-interface vlan.551 set routing-instances VRF-VLAN551 instance-type virtual-router set routing-instances VRF-VLAN551 interface vlan.551 set routing-instances VRF-VLAN551 protocols bgp family inet unicast set routing-instances VRF-VLAN551 protocols bgp local-as 3597 set routing-instances VRF-VLAN551 protocols bgp group CORE neighbor 10.0.5.9 description PEER-VLAN551 set routing-instances VRF-VLAN551 protocols bgp group CORE neighbor 10.0.5.9 local-address 10.0.5.11 set routing-instances VRF-VLAN551 protocols bgp group CORE neighbor 10.0.5.9 import rm-import set routing-instances VRF-VLAN551 protocols bgp group CORE neighbor 10.0.5.9 export rm-export set routing-instances VRF-VLAN551 protocols bgp group CORE neighbor 10.0.5.9 peer-as 3597
MX10 Config:
set interfaces ae0 unit 551 description VLAN551 set interfaces ae0 unit 551 vlan-id 551 set interfaces ae0 unit 551 family inet address 10.0.5.9/29 set interfaces ae0 unit 551 family iso set protocols bgp group GIOL-VRFINST type internal set protocols bgp group GIOL-VRFINST family inet unicast set protocols bgp group GIOL-VRFINST cluster 5.5.5.5 set protocols bgp group GIOL-VRFINST peer-as 3597 set protocols bgp group GIOL-VRFINST neighbor 10.0.5.11 local-address 10.0.5.9 set protocols bgp group GIOL-VRFINST neighbor 10.0.5.11 import rm-import set protocols bgp group GIOL-VRFINST neighbor 10.0.5.11 export rm-export
Example of log messages during a event; thats happen once every 2 or 3 hours.
MX10
Jan 4 18:21:59.954957 bgp_hold_timeout:4055: NOTIFICATION sent to 10.0.5.11 (Internal AS 3597): code 4 (Hold Timer Expired Error), Reason: holdtime expired for 10.0.5.11 (Internal AS 3597), socket buffer sndcc: 57 rcvcc: 0 TCP state: 4, snd_una: 3256096123 snd_nxt: 3256096180 snd_wnd: 16384 rcv_nxt: 3443979671 rcv_adv: 3443996055, hold timer out 90s, hold timer remain 0s Jan 4 18:21:59.955057 bgp_peer_close: closing peer 10.0.5.11 (Internal AS 3597), state is 7 (Established) Jan 4 18:21:59.955107 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Established event HoldTime new state Idle Jan 4 18:22:00.172348 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Idle event Start new state Active Jan 4 18:22:32.173849 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Active event ConnectRetry new state Connect Jan 4 18:23:47.173435 bgp_connect_complete: error connecting to 10.0.5.11 (Internal AS 3597): Socket is not connected Jan 4 18:23:47.173584 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Connect event OpenFail new state Idle Jan 4 18:23:47.173966 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Idle event Start new state Connect Jan 4 18:23:47.173996 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Connect event ConnectRetry new state Connect Jan 4 18:24:56.423226 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state Connect event Open new state OpenSent Jan 4 18:24:56.424201 advertising graceful restart receiving-speaker-only capability to neighbor 10.0.5.11 (Internal AS 3597) Jan 4 18:24:56.430079 advertising graceful restart receiving-speaker-only capability to neighbor 10.0.5.11 (Internal AS 3597) Jan 4 18:24:56.430136 Jan 4 18:24:56.430136 BGP SEND 10.0.5.9+179 -> 10.0.5.11+63774 Jan 4 18:24:56.430176 BGP SEND message type 1 (Open) length 59 Jan 4 18:24:56.430202 BGP SEND version 4 as 3597 holdtime 90 id 168.96.6.10 parmlen 30 Jan 4 18:24:56.430310 BGP SEND MP capability AFI=1, SAFI=1 Jan 4 18:24:56.430334 BGP SEND Refresh capability, code=128 Jan 4 18:24:56.430354 BGP SEND Refresh capability, code=2 Jan 4 18:24:56.430377 BGP SEND Restart capability, code=64, time=120, flags= Jan 4 18:24:56.430400 BGP SEND 4 Byte AS-Path capability (65), as_num 3597 Jan 4 18:24:56.430433 Jan 4 18:24:56.430433 BGP SEND 10.0.5.9+179 -> 10.0.5.11+63774 Jan 4 18:24:56.430470 BGP SEND message type 3 (Notification) length 21 Jan 4 18:24:56.430492 BGP SEND Notification code 6 (Cease) subcode 7 (Connection collision resolution) Jan 4 18:24:56.443023 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state OpenSent event RecvOpen new state OpenConfirm Jan 4 18:24:56.443137 bgp_read_message: 10.0.5.11 (Internal AS 3597): 0 bytes buffered Jan 4 18:24:56.456538 bgp_event: peer 10.0.5.11 (Internal AS 3597) old state OpenConfirm event RecvKeepAlive new state Established
SRX
Jan 4 18:21:59.970738 bgp_read_v4_message:10642: NOTIFICATION received from 10.0.5.9 (Internal AS 3597): code 4 (Hold Timer Expired Error), socket buffer sndcc: 57 rcvcc: 0 TCP state: 5, snd_una: 3443979671 snd_nxt: 3443979728 snd_wnd: 16384 rcv_nxt: 3256096202 rcv_adv: 3256112565, hold timer out 90s, hold timer remain 41.093555s Jan 4 18:21:59.971008 bgp_peer_close: closing peer 10.0.5.9 (Internal AS 3597), state is 7 (Established) Jan 4 18:21:59.971247 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Established event RecvNotify new state Idle Jan 4 18:21:59.976374 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Idle event Start new state Active Jan 4 18:22:31.973760 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Active event ConnectRetry new state Connect Jan 4 18:23:46.973987 bgp_connect_complete: error connecting to 10.0.5.9 (Internal AS 3597): Socket is not connected Jan 4 18:23:46.974366 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Connect event OpenFail new state Idle Jan 4 18:23:46.977161 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Idle event Start new state Connect Jan 4 18:23:46.977342 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Connect event ConnectRetry new state Connect Jan 4 18:24:56.438429 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Connect event Open new state OpenSent Jan 4 18:24:56.438722 advertising graceful restart receiving-speaker-only capability to neighbor 10.0.5.9 (Internal AS 3597) Jan 4 18:24:56.443393 bgp_pp_recv:3396: NOTIFICATION sent to 10.0.5.9 (Internal AS 3597): code 6 (Cease) subcode 7 (Connection collision resolution), Reason: dropping 10.0.5.9 (Internal AS 3597), connection collision prefers 10.0.5.9+51507 (proto) Jan 4 18:24:56.443984 bgp_peer_close: closing peer 10.0.5.9 (Internal AS 3597), state is 4 (OpenSent) Jan 4 18:24:56.444535 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state OpenSent event Stop new state Idle Jan 4 18:24:56.445281 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Idle event Start new state Active Jan 4 18:24:56.448441 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state Active event Open new state OpenSent Jan 4 18:24:56.448638 advertising graceful restart receiving-speaker-only capability to neighbor 10.0.5.9 (Internal AS 3597) Jan 4 18:24:56.449074 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state OpenSent event RecvOpen new state OpenConfirm Jan 4 18:24:56.460226 bgp_event: peer 10.0.5.9 (Internal AS 3597) old state OpenConfirm event RecvKeepAlive new state Established
I need help to understand this log messages to determine if the problem is in BGP config; i've read about the Cease and Collisions events, but i don't see why occurs this in current config; i have between same devices similar configurations (on other vlan) and there is no problem.
Regards!