Albon, Chris. 2018. Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning. O’Reilly.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. “A Neural Probabilistic Language Model.” Journal of Machine Learning Research 3: 1137–55.
Bishop, Christopher M. 2007. Pattern Recognition and Machine Learning. 2nd ed. Springer.
Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge University Press.
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches.” In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 103–11.
Chollet, François. 2021. Deep Learning with Python, 2nd Edition. 2nd ed. Manning.
Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” In NIPS 2014 Workshop on Deep Learning.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.”
Dozat, Timothy. 2016. “Incorporating Nesterov Momentum into Adam.” In Proceedings of the 4th International Conference on Learning Representations.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12: 2121–59.
Elman, Jeffrey L. 1990. “Finding Structure in Time.” Cognitive Science 14: 179–211.
Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biological Cybernetics 36 (4): 193–202.
Fukushima, Kunihiko, Sei Miyake, and Takayuki Ito. 1983. “Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition.” IEEE Transactions on Systems, Man, and Cybernetics SMC-13 (5): 826–34.
Gallatin, Kyle, and Chris Albon. 2023. Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning. 2nd ed. O’Reilly Media.
Gao, W., L. Graesser, K. Choromanski, X. Song, N. Lazic, P. R. Sanketi, V. Sindhwani, and N. Jaitly. 2020. “Robotic Table Tennis with Model-Free Reinforcement Learning.” In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5556–63.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 2nd ed. O’Reilly Media.
Girshick, Ross. 2015. “Fast r-CNN.” In Proc. Of the IEEE International Conference on Computer Vision (ICCV), 1440–48. Santiago.
Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580–87.
Goh, Gabriel. 2017. “Why Momentum Really Works.”
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Graves, Alex. 2014. “Generating Sequences with Recurrent Neural Networks.”
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28 (10): 2222–32.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78.
Herculano-Houzel, Suzana. 2009. “The Human Brain in Numbers: A Linearly Scaled-up Primate Brain.” Frontiers in Human Neuroscience 3: 31.
Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2012. “Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors.”
Hochreiter, Sepp. 1991. “Untersuchungen Zu Dynamischen Neuronalen Netzen.” Diplomarbeit, Institut für Informatik, Technische Universität München.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.
Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), 448–56.
Jordan, Michael I. 1990. “Attractor Dynamics and Parallelism in a Connectionist Sequential Machine.” In Artificial Neural Networks: Concept Learning, 112–27. IEEE Press.
Khan, A., A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. “A Survey of the Recent Architectures of Deep Convolutional Neural Networks.” Artificial Intelligence Review 53: 5455–5516.
Kingma, Diederik P., and Jimmy Lei Ba. 2015. “Adam: A Method for Stochastic Optimization.” In International Conference on Learning Representations, 1–13.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6): 84–90.
LeCun, Yann, and Yoshua Bengio. 1998. “Convolutional Networks for Images, Speech, and Time Series.” In The Handbook of Brain Theory and Neural Networks, 255–58. Cambridge, MA, USA: MIT Press.
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation 1 (4): 541–51.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.
Lee, J., J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. 2020. “Learning Quadrupedal Locomotion over Challenging Terrain.” Science Robotics 5.
Lei Ba, Jimmy, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. “Layer Normalization.”
McCulloch, Warren, and Walter Pitts. 1943. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics 5: 115–33.
Melville, James. 2016. “Nesterov Accelerated Gradient and Momentum.”
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.”
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. MIT Press.
Minsky, Marvin, and Seymour A. Papert. 2017. Perceptrons: An Introduction to Computational Geometry. Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou. The MIT Press.
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.”
Nesterov, Yurii. 1983. “A Method for Unconstrained Convex Minimization Problem with the Rate of Convergence O(1/k2).” Doklady ANSSSR (Translated as Soviet.Math.Docl.) 269: 543–47.
Nielsen, Michael. 2015. Neural Networks and Deep Learning.
Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” In Advances in Neural Information Processing Systems, 35:27730–44.
Parr, Terence, and Jeremy Howard. 2018. “The Matrix Calculus You Need for Deep Learning.”
Qian, Ning. 1999. “On the Momentum Term in Gradient Descent Learning Algorithms.” Neural Networks 12 (1): 145–51.
Raschka, Sebastian, Yuxi (Hayden) Liu, and Vahid Mirjalili. 2022. Machine Learning with PyTorch and Scikit-Learn: Develop Machine Learning and Deep Learning Models with Python. Packt Publishing.
Raschka, Sebastian, and Vahid Mirjalili. 2019. Python Machine Learning - Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow 2, 3rd Edition. 3rd ed. Packt Publishing.
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. “You Only Look Once: Unified, Real-Time Object Detection.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–88.
Redmon, Joseph, and Ali Farhadi. 2017. “YOLO9000: Better, Faster, Stronger.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517–25.
———. 2018. “YOLOv3: An Incremental Improvement.” University of Washington.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2017. “Faster r-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6): 1137–49.
Rosenblatt, Frank. 1957. “The Perceptron - a Perceiving and Recognizing Automaton.” Report 85-460-1. Cornell Aeronautical Laboratory.
———. 1958. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65 (6): 386–408.
Ruder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.”
Rumelhart, D., G. Hinton, and R. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323: 533–36.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117.
Sermanet, Pierre, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. “OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks.” In International Conference on Learning Representations (ICLR).
Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.”
Simonyan, Karen, and Andrew Zisserman. 2015. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” In International Conference on Learning Representations (ICLR).
Skiena, Steven S. 2017. The Data Science Design Manual. Springer.
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15 (56): 1929–58.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML).
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. Bradford Books.
Trask, Andrew. 2019. Grokking Deep Learning. Manning Publications.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems, 30:5998–6008.
Werbos, Paul. 1974. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.” PhD Thesis, Boston, MA: Harvard University.
———. 1990. “Backpropagation Through Time: What It Does and How to Do It.” Proceedings of the IEEE 78 (10): 1550–60.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, edited by R. F. Drenick and F. Kozin, 762–70. Springer.
Widrow, Bernard. 1960. “An Adaptive "ADALINE" Neuron Using Chemical "Mimistors".” Technical Report 1553-2. Solid-State Electronics Laboratory, Stanford University.
Widrow, Bernard, and Michael A. Lehr. 1990. “30 Years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation.” Proceedings of the IEEE 78 (9): 1415–42.
Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation 1 (2): 270–80.
Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive Learning Rate Method.”
Zeiler, Matthew D., and Rob Fergus. 2014. “Visualizing and Understanding Convolutional Networks.” In Proc. Of the European Conference on Computer Vision (ECCV), 818–33.
Zhang, Richard, Phillip Isola, and Alexei A. Efros. 2016. “Colorful Image Colorization.” In European Conference on Computer Vision.