At AWS re: Invent 2019, Amazon officially unveiled its new Inferentia chip, developed for machine learning.
AWS Inferentia is a custom chip that enables faster and more cost-effective machine learning. Inferencing, that is, using models that you have already trained to perform tasks and make predictions. you-go usage model. Low latency is also promised by a large amount of on-chip memory.
Inferentia can achieve up to 128 TOPS (trillion operations per second) in terms of this inference throughput, and multiple chips can be combined if you really want to exceed the performance limits.
As TechCrunch reports, Amazon's new Inf1 instances promise up to 2,000 TOPS, no less. Compared to a regular G4 instance on EC2 that uses the latest Nvidia T4 GPUs, Amazon claims that these new instances are three times throughput with a 40% lower cost-to-inference ratio, making it a compelling proposition , 19659002] Currently, Inferentia is only available with Amazon EC2, but it will be available soon enough for other Amazon services, including SageMaker and Amazon Elastic Inference and trained in popular frameworks.
Amazon notes, "Neuron consists of a compiler, runtime, and profiling tools, and is integrated with common machine learning frameworks such as TensorFlow, Pytorch, and MXNet to achieve optimal performance of EC2 Inf1 instances. "
In addition, Amazon notes that Inferentia and its lower-cost character are part of a broader effort to make machine learning accessible to all developers.  While the Inferentia chip may not be an immediate threat to Nvidia, the path that Amazon may take for the future and the customers it may attract for its relatively affordable cloud-based model could be a threat to revenue of Nvidia on the Internet represent machine learning arena. And of course, Amazon itself does not have to buy Tesla graphics solutions when the company uses its own hardware …
Custom chips have obvious benefits in terms of the performance they can achieve, and Amazon is not the only innovator with ideas in the field, Google has been driving its own Tensor Processing Unit (TPU) solution for several years.