Home / SmartTech / IBM claims its neural computer has reached a record-breaking AI model training time

IBM claims its neural computer has reached a record-breaking AI model training time

In a technical document that was quietly published earlier this year, IBM described the so-called IBM Neural Computer, a reconfigurable parallel processing system that was developed for the research and development of new AI algorithms and computer-aided neurosciences. This week, the company released a preprint describing the first application demonstrated on the neural computer: a profound “neuroevolution” system that combines the hardware implementation of an Atari 2600, image preprocessing, and AI algorithms in an optimized pipeline. The co-authors report results that compete with the latest techniques, but perhaps more importantly, the system achieves a record training time of 1.2 million frames per second.

The neural computer is a kind of shot over the bow in the computerized AI arms race. According to a recent analysis published by OpenAI, the computing effort for the largest AI training runs increased more than 300,000 times from 201

2 to 2018 with a doubling time of 3.5 months, far exceeding the pace of Moore’s law. In this sense, supercomputers like the upcoming Aurora from Intel in the Argonne National Laboratory of the Department of Energy and the AMD Frontier in the Oak Ridge National Laboratory promise more than an exaflop (one trillion floating point calculations per second) of computing power.

Video games are an established platform for AI and machine learning. They have grown in importance not only because of their availability and the low cost of operating on a large scale, but also because in certain areas, such as enhanced learning, where AI interacts with the environment to learn optimal behaviors to receive rewards , The game results serve as a direct reward. AI algorithms developed in games have proven to be more adaptable for more practical applications such as protein folding prediction. And if the results of IBM’s neural computer prove to be repeatable, the system could be used to speed up the development of these AI algorithms.

The neural computer

The IBM neural computer consists of 432 nodes (27 nodes on 16 modular cards), which are based on field programmable gate arrays (FPGAs) from Xilinx, a long-time strategic employee of IBM. (FPGAs are integrated circuits that can be configured after manufacture.) Each node consists of an on-chip Xilinx Zynq system – a dual-core ARM-A9 processor paired with an FPGA on the same chip – and 1 GB dedicated RAM. The nodes are arranged in a 3D mesh topology that is vertically connected to electrical connections called vias through silicon, which run entirely through silicon wafers or chips.

VB Transform 2020 Online – 15.-17. July: Join leading AI leaders at the AI ​​event of the year. Register today and save 30% on digital access passes.
IBM Neural Computer

Above: A single card from IBM’s Neural Computer.

Photo credit: IBM

On the network side, the FPGAs provide access to the physical communication links between cards to set up several different communication channels. A single card can theoretically support transfer speeds of up to 432 GB per second. However, the network interfaces of the neural computer can be adjusted and gradually optimized to best suit a particular application.

“The availability of FPGA resources on each node enables application-specific processor offloading, a function that is not available on any parallel computer of this size known to us,” write the co-authors of a paper that describes the architecture of the neural computer. “[M]east of the performance critical steps [are] unloaded and optimized on the FPGA with the ARM [processor] … Provide additional support. “

Play Atari games with AI

The researchers used 26 of 27 nodes per card within the neural computer and carried out experiments on a total of 416 nodes. Each of the 416 FPGAs ran two instances of their Atari game application, scaling up to 832 instances running in parallel. Each instance extracted frames from a particular Atari 2600 game, image preprocessed, machine-modeled the images, and performed an action within the game.

To get the best performance, the team shy away from emulating the Atari 2600 and opted instead to use the FPGAs to implement the functionality of the console at higher frequencies. They used a framework from the open-source MiSTer project to create consoles and arcade machines with modern hardware, and increased the processor clock of the Atari 2600 from 3.58 MHz to 150 MHz. This produced approximately 2,514 frames per second compared to the original 60 frames per second.

In the image preprocessing step, the IBM application converted the frames from color to grayscale, eliminated flickering, resized images to a smaller resolution and stacked the frames in groups of four. These were then passed on to an AI model that thought about the game environment and to a submodule that selected the action for the next frames by determining the maximum reward predicted by the AI ​​model.

IBM Neural Computer

Above: Results of the experiments.

Photo credit: IBM

Another algorithm – a genetic algorithm – ran on an external computer that was connected to the neural computer via a PCIe connection. It evaluated the performance of each instance and identified the performance of the group selected as the “parent” of the next generation of instances.

In five experiments, IBM researchers ran 59 Atari 2600 games on the neural computer. The results suggest that the approach was not data-efficient compared to other reinforcement learning techniques – it required a total of 6 billion game frames and failed in challenging exploration games like Montezuma’s Revenge and Pitfall. But he managed to outperform a popular baseline – a Deep Q network, an architecture developed by DeepMind – in 30 of 59 games after 6 minutes of training (200 million training frames) compared to the 10 of the Deep-Q network Days of training. With 6 billion training frames, it outperformed the Deep Q network in 36 games and required 2 orders of magnitude less training time (2 hours and 30 minutes).

Source link