g-f(2)861 THE NEW WORLD NEWS (1/26/2022), Quanta magazine, Researchers Build AI That Builds AI

  • By using hypernetworks, researchers can now preemptively fine-tune artificial neural networks, saving some of the time and expense of training.
  • Boris Knyazev of the University of Guelph in Ontario and his colleagues have designed and trained a “hypernetwork” — a kind of overlord of other neural networks — that could speed up the training process. Given a new, untrained deep neural network designed for some task, the hypernetwork predicts the parameters for the new network in fractions of a second, and in theory could make training unnecessary. Because the hypernetwork learns the extremely complex patterns in the designs of deep neural networks, the work may also have deeper theoretical implications.

Anil Ananthaswamy

Anil Ananthaswamy is a journalist and author. He is a 2019-20 MIT Knight Science Journalism fellow. His latest book, Through Two Doors at Once, is about quantum mechanics and the double-slit experiment. He is a former deputy news editor for New Scientist magazine and currently a freelance feature editor for PNAS’s Front Matter. Besides Quanta, he writes for New Scientist, Scientific American, Knowable and Undark, among others. He won the UK Institute of Physics’ Physics Journalism award and the British Association of Science Writers’ award for Best Investigative Journalism. His first book, The Edge of Physics, was voted book of the year in 2010 by Physics World, and his second book, The Man Who Wasn’t There, was long-listed for the 2016 Pen/E. O. Wilson Literary Science Writing Award.

  • For now, the hypernetwork performs surprisingly well in certain settings, but there’s still room for it to grow — which is only natural given the magnitude of the problem. If they can solve it, “this will be pretty impactful across the board for machine learning,” said Veličković.
  • Training the Trainer
    • Knyazev and his team call their hypernetwork GHN-2, and it improves upon two important aspects of the graph hypernetwork built by Ren and colleagues.
    • First, they relied on Ren’s technique of depicting the architecture of a neural network as a graph. Each node in the graph encodes information about a subset of neurons that do some specific type of computation. The edges of the graph depict how information flows from node to node, from input to output.
    • The second idea they drew on was the method of training the hypernetwork to make predictions for new candidate architectures. This requires two other neural networks. 
  • Impressive Results
    • The real test, of course, was in putting GHN-2 to work. Once Knyazev and his team trained it to predict parameters for a given task, such as classifying images in a particular data set, they tested its ability to predict parameters for any random candidate architecture. 

