Comments on Computational Complexity: Learning About Learning

Re "how much of an art machine learning still...

2017-01-06T16:18:29.821-06:00

Re "how much of an art machine learning still is" - a Deep Neural Network has a lot of engineering choices to make, not just gradient descent methods and threshold functions but the large scale architecture of the connections among layers, where there's a CNN, an RNN, an LSTM, etc. In comparison, SVMs, Random Forests, and regression have just a handful of hyper-parameters to tweak.

"Where were all these cool tools when I was a...

2017-01-06T15:08:51.129-06:00

"Where were all these cool tools when I was a kid?" - When you were a kid, the disgruntled gray hairs were envious of your tools like a keyboard and screen.

When 'kids these days' are old they'll be marveling at the youngsters that seem to just twitch their eyelids to pilot their interstellar spacecraft.

I'm not saying you have gray hair.

As a follow-up, the above concrete computational c...

2017-01-06T12:08:38.333-06:00

As a follow-up, the above concrete computational considerations in regard to rank-jumping in tensor network representations are surveyed abstractly in "Yellow Book" comment #91 on Scott Aaronson's Shtetl Optimized essay "My 116-page survey article on P vs. NP" (of Jan 03 2017).

As a followup, I will attempt to compose these two perspectives — abstract-with-concrete and Yellow Book-with-pragmatic — in a MathOverflow and/or TCS StackExchange question regarding "Yellow Book" descriptions of rank-jumping in practical computational simulations. Such attempted compositions — from me or anyone — can rightly be appreciated as tributes to a small-yet-vital community, namely the proprietors of mathematical weblogs.

Math weblogs require of their proprietors a sustained personal commitment that (as it seems to me and many) crucially nourishes the vitality of the 21st century's diverse STEAM enterprises. In particular, math weblogs crucially nourish the hopeful enthusiasm of Yellow Book Era STEAM-students — hundreds of millions of 21st century STEAM-students, YIKES! :) — who will inherit and, as we can hope and even reasonably foresee, apply fresh Yellow Book understandings in marvelously extending our 21st century's great STEAM-enterprises.

This New Year's appreciation of math weblogs, and heartfelt gratefulness for the sustained efforts of their oft-underappreciated proprietors, is therefore extended.

------ Lance asks "How many nodes should you...

2017-01-06T05:20:11.893-06:00

------
Lance asks "How many nodes should you have in your network? How many levels? Too many may take too long to train and could cause overfitting. Too few and you don't have enough parameters to create the function you need."
------
Algorithmic answers to these questions center upon the notion of "rank-jumping" (as at least some portions of the literature call it).

Specifically in regard to the rank-jumping literature, a notably student-friendly multi-reference multi-example survey is Vin de Silva and Lek-Heng Lim's "Tensor rank and the ill-posedness of the best low-rank approximation problem" (SIAM Journal on Matrix Analysis and Applications, 2008).

The de Silva/Lim survey has been concretely helpful (to me) in upgrading quantum simulation codes that, dynamically and adaptively, raise-and-lower the ranks of tensor representations. Algorithms that once were ad hoc, evolve to be more nearly universal and natural (and stable too).

Sweet! Hoorah for "Team Yellow Book"! :)

Further suggestions in regard to this "Yellow Book" literature — whether in the language of "rank jumps" or "topological closure" or any other GAGA-esque terminology — would be welcome to me and many. It's been plenty challenging (for me at least) to reduce this literature's beautiful insights to concrete algorithmic practice.

The idea behind convolution net is as follows: thi...

2017-01-06T01:41:10.665-06:00

The idea behind convolution net is as follows: think of images and a box. Let's define a feature over the pixels in the box like the existence of a vertical line and have a neural network for it. Now the location of this box doesn't matter for the feature, so if you are looking for vertical lines in an image you can just use the same network for all of them, you can share the weights between networks for the feature. This saves a lot of weights and makes training and inference practical.

Many important papers in machine learning are about intelligent ways for saving computation time. You really don't want the number of computation steps to grow superlinearly with respect to network depth, input size, ... that would make training and inference infeasible in practice. CNNs are the reason deep learning worked in practice and beat all previous algorithms in image recognition by a large margin. Machine learning requires a good deal of engineering to have practical algorithms that you can actually run and test, even constant factors matter.

https://arxiv.org/abs/1611.01578 https://arxiv.or...

2017-01-06T01:12:46.103-06:00

https://arxiv.org/abs/1611.01578

https://arxiv.org/abs/1505.00521

https://media.nips.cc/Conferences/2015/tutorialslides/wood-nips-probabilistic-programming-tutorial-2015.pdf

Have you tried TF's playground?

2017-01-05T22:21:51.586-06:00

Have you tried TF's playground?

"Convolution nets has a special first layer t...

2017-01-05T14:00:38.165-06:00

"Convolution nets has a special first layer that captures features of pieces of the image."

This is not correct. Convolutional neural networks have many convolutional layers, anywhere, not necessarily at the first layer. These layers exploit locality and translation invariance, two important properties of image-like data. Here is a stylized example showing how convolutional neurons can recognize higher-and-higher level abstractions, from edges through noses to faces:

https://i.stack.imgur.com/Hl2H6.png

Small correction: recurrent nets represent time de...

2017-01-05T10:55:05.605-06:00

Small correction: recurrent nets represent time dependencies (or more generally, dependencies along any DAG), not feedback loops. Each computation of a recurrent net can be unfolded into a feed-forward net of depth O(input size).