Juniper facing fatal clock flaw that impacts Cisco routers, switches

The fatal clock timing flaw that causes a variety of switches, routers and security appliances to crash and die after about 18 months of service is apparently part of some Juniper products.

Cisco was the first vendor to post a notice about mortal clock fail earlier this month saying the notice includes some of the company’s most widely deployed products, such as certain models of its Series 4000 Integrated Services Routers, Nexus 9000 Series switches, ASA security devices and Meraki Cloud Managed Switches. Clock components are critical to the synchronization of multiple levels of a given device.

+More on Network World: Cisco: Faulty clock part could cause failure in some Nexus switches, ISR routers, ASA security appliances+

Cisco wrote: “In some units, we have seen the clock signal component degrade over time. Although the Cisco products with this component are currently performing normally, we expect product failures to increase over the years, beginning after the unit has been in operation for approximately 18 months. Once the component has failed, the system will stop functioning, will not boot, and is not recoverable. 

In what looks to be a screenshot of a Juniper Technical Service Bulletin posted on Reddit’s Networking subReddit site this week, Juniper is now telling customers a similar story: “Although we believe the Juniper products with this component are performing normally as of February 13, 2017 the [listed] Juniper products could after the product has been in operation for at least 18 months begin to exhibit symptoms such as the inability to boot, or cease to operate. Recovery in the field is not possible. Juniper product with this supplier’s component were first placed into service on January 2016. Jupiter is working with the component supplier to implement a remediation. In addition, Juniper’s spare parts depots will be purged and updated with remediated products.”

The products in the warning comprise 13 Juniper switches, routers and other products including the MPC7E 10G, MPC7E (multi rate), MX2K-MPC8E, EX 920 Ethernet switches and PTX3000 integrated photonic line card.

For its part Juniper Networks said in a statement today that it “is aware of an issue related to a component manufactured by a supplier which impacts a limited set of our product line. We are currently working directly with any impacted customers on a swift solution.”

+More on Network World: Cisco, competitors infiltrate Avaya customer doubts+

Neither Cisco nor Juniper have been willing to identify the killer clock signaling component but the problems coincide with difficulties described by Intel on its Atom C2000 chip that is used by a number of hardware makers.

It has been widely reported that problems with Atom hurt Intel’s 2016 Q4 earnings. CFO Robert Swan is quoted in an earnings call transcript on the Seeking Alpha website as saying: “But secondly, and a little bit more significant, we were observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints, and we established a reserve to deal with that. We think we have it relatively well-bounded with a minor design fix that we’re working with our clients to resolve. So, those two one-timers in the fourth quarter weighed on DCG margins, and we do not expect that to continue in 2017.”

The IDG News Service recently wrote of the Atom’s troubles, reporting in January that Intel added an erratum to the Atom C2000 documentation stating systems with the chip “may experience [an] inability to boot or may cease operation.”

The chip is the last among Intel’s line of short-lived low-power Atom chips for servers. It was used in microservers but also networking equipment from companies like Cisco, which has issued an advisory about a product defect related to a component degrading clock signals over time. A clock signal degrade hurts the ability of the chip to carry out tasks. Intel is trying to fix the issue but declined to comment on when it’ll deliver an update, the IDG story stated.

“There’s a board level workaround that we are sharing with customers now,” an Intel spokesman said in an email to the IDG News Service “Additionally, we are implementing and validating a minor silicon fix in a new product [update].”

+More on Network World: Juniper founder, CTO Sindhu cuts role to focus on startup+

While the Intel technology may or may not be at the heart of the problem, so far only Juniper and Cisco have released notice of the issue. Contacted by Network World, a number of vendors (including Avaya and Arista) have confirmed their products don’t use the faulty product.

For example, Big Switch said it provides SDN-based offerings (Big Cloud Fabric and Big Monitoring Fabric), which are deployed with third party open networking (whitebox / britebox) switch hardware. “As soon as we became aware of the ‘faulty clock hardware’ issue, our team investigated it and our initial analysis indicated that it is purely hardware-related and no Big Switch software is affected. We will continue to monitor the situation closely with our open networking switch partners.”

Brocade said it does not use the affected control processor in any of its products. “We have not experienced any related failures, nor have we been contacted by any of our suppliers regarding an issue with their components or degradation of their clock signals.”

HPE and Dell have not responded to questions around the clock technology, though both are rumored to use it.  Extreme declined to comment on the situation.

From Cisco’s site, here are a few of the FAQs about the problem. [A full version is available here].

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

You may also like...