URL: https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf
%PDF-1.3
1 0 obj
<<
/Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R ]
/Type /Pages
/Count 9
>>
endobj
2 0 obj
<<
/Subject (Neural Information Processing Systems http\072\057\057nips\056cc\057)
/Publisher (Curran Associates)
/Language (en\055US)
/Created (2012)
/Description-Abstract (Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance\056 In this paper\054 we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores\056 We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models\056 Within this framework\054 we have developed two algorithms for large\055scale distributed training\072 \050i\051 Downpour SGD\054 an asynchronous stochastic gradient descent procedure supporting a large number of model replicas\054 and \050ii\051 Sandblaster\054 a framework that supports for a variety of distributed batch optimization procedures\054 including a distributed implementation of L\055BFGS\056 Downpour SGD and Sandblaster L\055BFGS both increase the scale and speed of deep network training\056 We have successfully used our system to train a deep network 100x larger than previously reported in the literature\054 and achieves state\055of\055the\055art performance on ImageNet\054 a visual object recognition task with 16 million images and 21k categories\056 We show that these same techniques dramatically accelerate the training of a more modestly sized deep network for a commercial speech recognition service\056 Although we focus on and report performance of these methods as applied to training large neural networks\054 the underlying algorithms are applicable to any gradient\055based machine learning algorithm\056)
/Producer (Python PDF Library \055 http\072\057\057pybrary\056net\057pyPdf\057)
/Title (Large Scale Distributed Deep Networks)
/Date (2012)
/Type (Conference Proceedings)
/firstpage (1223)
/Book (Advances in Neural Information Processing Systems 25)
/Description (Paper accepted and presented at the Neural Information Processing Systems Conference \050http\072\057\057nips\056cc\057\051)
/Editors (F\056 Pereira and C\056J\056C\056 Burges and L\056 Bottou and K\056Q\056 Weinberger)
/Author (Jeffrey Dean\054 Greg Corrado\054 Rajat Monga\054 Kai Chen\054 Matthieu Devin\054 Mark Mao\054 Marc\047aurelio Ranzato\054 Andrew Senior\054 Paul Tucker\054 Ke Yang\054 Quoc V\056 Le\054 Andrew Y\056 Ng)
/lastpage (1231)
>>
endobj
3 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
4 0 obj
<<
/Contents 13 0 R
/Parent 1 0 R
/Type /Page
/Resources 14 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
5 0 obj
<<
/Contents 36 0 R
/Parent 1 0 R
/Type /Page
/Resources 37 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
6 0 obj
<<
/Contents 38 0 R
/Parent 1 0 R
/Type /Page
/Resources 39 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
7 0 obj
<<
/Contents 53 0 R
/Parent 1 0 R
/Type /Page
/Resources 54 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
8 0 obj
<<
/Contents 121 0 R
/Parent 1 0 R
/Type /Page
/Resources 122 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
9 0 obj
<<
/Contents 131 0 R
/Parent 1 0 R
/Type /Page
/Resources 132 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
10 0 obj
<<
/Contents 152 0 R
/Parent 1 0 R
/Type /Page
/Resources 153 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
11 0 obj
<<
/Contents 168 0 R
/Parent 1 0 R
/Type /Page
/Resources 169 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
12 0 obj
<<
/Contents 184 0 R
/Parent 1 0 R
/Type /Page
/Resources 185 0 R
/MediaBox [ 0 0 612 792 ]
>>
endobj
13 0 obj
<<
/Length 2986
/Filter /FlateDecode
>>
stream
xڍ�r�F��m�*��@N�G�rb��Xޭ�և!8$a�Xa�>����^Ğ��AO��,��`q{\��~�o�a�/�"��4^��D�E���"O�^�n~�yuw���8^�k?
�dq�[�Y�a�Ȳ��pq�]�{�U�o���(
�O��|Sٮ�6�(��No�����u��'M�`�_�~�����4��