How the chip industry is keeping calm during a 'Meltdown'

In early January, chip manufacturers rushed to their own defense following the early disclosure of two massive chip flaws dating back a couple decades. However, a company's reputation is dependent on the integrity of both its products and its ethics.

Intel, along with AMD and ARM, blurred the lines of ethical responsibility in terms of the safety of customers' data. In the time between Intel first learning of the flaws and its disclosure, security patches were issued relentlessly, by chip manufacturers and partnered vendors such as Microsoft and Google.

However Intel's patches have not fixed the underlying issue.

If someone figures out how to exploit the flaws, they basically can gain "access to all of your secrets that you have stored in memory," said Werner Haas, CTO of Cyberus Technology, in an interview with CIO Dive. But processors powerful enough to run a malicious code to access memory data twenty-something years ago was not quite feasible.

While the flaws date back two decades, actual risk is only about 10 years old. "The principle itself goes back to the mid-1990s; it's hard to imagine that you're really able to exploit that simply because the processor pipelines were not deep enough," said Haas.

The capabilities were simply "not sophisticated enough that you could run enough code in parallel in order to do this kind of information disclosure."

But, processors developed in the last 10 years could be fair game, according to Thomas Prescher, architect at Cyberus Technology.

When Intel found out about the flaws

Uncovering the Meltdown vulnerability was a "lucky coincidence" followed by "a very strange feeling" because of the prospect of telling a major company that the vast majority of its chips contain a flaw dating back about two decades, according to Prescher.

Prescher is credited with finding the Meltdown flaw without any "insight into Intel architecture," according to Haas. "This was basically taking an idea that was out there, floating on the internet, and turning it into working, proof of concept code."

Upon finding the flaw in November, Cyberus reached out to internal contacts and senior fellows at Intel to ensure the discovery would "end up in the right hands," according to Haas.

Since opening the conversation with Intel, Cyberus learned other organizations had already informed Intel, raising the same concerns about potentially impacted chips. Google Project Zero reported its findings to Intel as early as June of last year.

"The security of customers' and end users’ systems and data is a critical priority for Intel. Once we were informed by Google Project Zero, we immediately went into action to validate the issue, deeply understand it technically and work to develop mitigations," according to an Intel spokesperson, in an emailed statement to CIO Dive.

"Our efforts around these mitigations and updates continue around the clock, and we are keeping our customers and technology ecosystem up to date on our progress."

What about the original embargo?

Intel originally informed Cyberus that its expected embargo date to inform customers of faulty chips would be Jan. 9. But news broke early, which forced Intel to respond in a defensive manner, ensuring customers that Intel chips were not the only ones that could be exploited.

The "embargo date had to be thrown away," according to Paul Kocher, an independent cryptography and computer security expert who contributed to research on the vulnerabilities. Companies, including Intel, were afraid that if the flaws went unaddressed for too long "bad people would know what to do."

Still, a slew of unreliable patches emerged from Intel. Intel's first round of patches elicited another patch after customers complained about rebooting and performance issues. Nearly two weeks later, Intel advised customers to halt all patch deployments after finding the root cause of the reboot issues without a replacement patch ready.

"It's unbelievable ... microcode programmers had always been a pretty scarce resource," said Haas. "I never imagined it being that bad because Intel had plenty of lead time getting something working ... it's really kind of appalling if you compare the quality assurance that goes into the silicon side of the chips and the software side."

As of Monday, Microsoft issued its own patches for Intel's rebooting problem. The update is available for devices with Windows 7, Windows 8.1 and Windows 10. Microsoft maintains the "Spectre variant 2 has been used to attack customers."

Where the weaknesses lie

In recent years, chip security has fallen to the wayside in favor of performance, according to Kocher.

The culprit of the flaws could be the industry's tendency to trade security for performance. The vulnerabilities took advantage of a feature within processors called speculative execution — a result of performance optimization to make programs run faster.

The branch predictor is a piece of the hardware that makes speculative execution possible. It is a method that "predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known," said Jann Horn, the cybersecurity researcher for Google's Project Zero credited with finding the CPU flaws, in a team update.

The branch predictor can contribute to security weaknesses, and it essentially acts as a predictive car, according to Kocher.

The "car," or branch predictor, guesses where it should go based on what past actions were. If the predictive actions were wrong, the car corrects itself and starts over. However, traces of the original actions are left, which is "where [the processor] can lose safety," said Kocher.

What companies should do now

The Computer Emergency Readiness Team (CERT) concluded that the only true resolution for the flaws is to replace all CPU hardware. The patches distributed by manufacturers and vendors are only software updates that mitigate the risk.

While this is a bad security problem, "don't panic," said Haas. The flaws do not necessarily mean all CPUs have to be tossed because replacement types are not even on the market yet.

Though there are stability issues from security patches, researchers agree that they are worth the performance hit.

It is vital that companies perform their own risk assessment, especially those that work off public servers. Companies that share public servers have the potential of running code from the internet through things like JavaScript, according to Haas.

Another long-term approach to mitigating risk is reevaluating where sensitive workloads are running, like "sharing the same processor core through hyper-threading," another performance enhancer, according to Kocher.

Additionally, having sensors that detect unusual network behavior or activity can help find the data that is being exfiltrated. Data that is just being viewed is harder, if not impossible, to detect.