Literatur
Albon, Chris. 2018. Python Machine Learning Cookbook: Practical
Solutions from Preprocessing to Deep Learning. O’Reilly.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin.
2003. “A Neural Probabilistic Language Model.” Journal
of Machine Learning Research 3: 1137–55. https://dl.acm.org/doi/10.5555/944919.944966.
Bishop, Christopher M. 2007. Pattern Recognition and Machine
Learning. 2nd ed. Springer.
Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to
Applied Linear Algebra: Vectors, Matrices, and Least Squares.
Cambridge University Press. https://web.stanford.edu/~boyd/vmls/vmls.pdf.
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book.
https://themlbook.com.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua
Bengio. 2014. “On the Properties of Neural Machine Translation:
Encoder–Decoder Approaches.” In Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure in Statistical
Translation, 103–11. https://arxiv.org/abs/1409.1259.
Chollet, François. 2021. Deep Learning with Python, 2nd
Edition. 2nd ed. Manning.
Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio.
2014. “Empirical Evaluation of Gated Recurrent Neural Networks on
Sequence Modeling.” In NIPS 2014 Workshop on Deep
Learning. https://arxiv.org/abs/1412.3555.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018.
“BERT: Pre-Training of Deep Bidirectional Transformers for
Language Understanding.” https://arxiv.org/abs/1810.04805.
Dozat, Timothy. 2016. “Incorporating Nesterov Momentum into
Adam.” In Proceedings of the 4th International Conference on
Learning Representations. https://cs229.stanford.edu/proj2015/054_report.pdf.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive
Subgradient Methods for Online Learning and Stochastic
Optimization.” Journal of Machine Learning Research 12:
2121–59. https://dl.acm.org/doi/10.5555/1953048.2021068.
Elman, Jeffrey L. 1990. “Finding Structure in Time.”
Cognitive Science 14: 179–211.
Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural
Network Model for a Mechanism of Pattern Recognition Unaffected by Shift
in Position.” Biological Cybernetics 36 (4): 193–202.
Fukushima, Kunihiko, Sei Miyake, and Takayuki Ito. 1983.
“Neocognitron: A Neural Network Model for a Mechanism of Visual
Pattern Recognition.” IEEE Transactions on Systems, Man, and
Cybernetics SMC-13 (5): 826–34.
Gallatin, Kyle, and Chris Albon. 2023. Python Machine Learning
Cookbook: Practical Solutions from Preprocessing to Deep Learning.
2nd ed. O’Reilly Media.
Gao, W., L. Graesser, K. Choromanski, X. Song, N. Lazic, P. R. Sanketi,
V. Sindhwani, and N. Jaitly. 2020. “Robotic Table Tennis with
Model-Free Reinforcement Learning.” In IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), 5556–63. https://arxiv.org/abs/2003.14398.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems. 2nd ed. O’Reilly Media.
Girshick, Ross. 2015. “Fast r-CNN.” In Proc. Of the
IEEE International Conference on Computer Vision (ICCV), 1440–48.
Santiago. https://arxiv.org/abs/1504.08083.
Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014.
“Rich Feature Hierarchies for Accurate Object Detection and
Semantic Segmentation.” In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 580–87. https://arxiv.org/abs/1311.2524.
Goh, Gabriel. 2017. “Why Momentum Really Works.” https://distill.pub/2017/momentum/.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep
Learning. MIT Press.
Graves, Alex. 2014. “Generating Sequences with Recurrent Neural
Networks.” https://arxiv.org/abs/1308.0850.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and
Jürgen Schmidhuber. 2017. “LSTM: A Search Space
Odyssey.” IEEE Transactions on Neural Networks and Learning
Systems 28 (10): 2222–32. https://arxiv.org/abs/1503.04069.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016.
“Deep Residual Learning for Image Recognition.” In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
770–78. https://doi.org/10.1109/CVPR.2016.90.
Herculano-Houzel, Suzana. 2009. “The Human Brain in Numbers: A
Linearly Scaled-up Primate Brain.” Frontiers in Human
Neuroscience 3: 31.
Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever,
and Ruslan Salakhutdinov. 2012. “Improving Neural Networks by
Preventing Co-Adaptation of Feature Detectors.” https://arxiv.org/abs/1207.0580.
Hochreiter, Sepp. 1991. “Untersuchungen Zu Dynamischen Neuronalen
Netzen.” Diplomarbeit, Institut für Informatik, Technische
Universität München. https://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term
Memory.” Neural Computation 9 (8): 1735–80. https://dl.acm.org/doi/10.1162/neco.1997.9.8.1735.
Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate
Shift.” In Proceedings of the 32nd International Conference
on International Conference on Machine Learning (ICML), 448–56. https://arxiv.org/abs/1502.03167.
Jordan, Michael I. 1990. “Attractor Dynamics and Parallelism in a
Connectionist Sequential Machine.” In Artificial Neural
Networks: Concept Learning, 112–27. IEEE Press.
Khan, A., A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. “A
Survey of the Recent Architectures of Deep Convolutional Neural
Networks.” Artificial Intelligence Review 53: 5455–5516.
https://arxiv.org/abs/1901.06032.
Kingma, Diederik P., and Jimmy Lei Ba. 2015. “Adam: A Method for
Stochastic Optimization.” In International Conference on
Learning Representations, 1–13. https://arxiv.org/abs/1412.6980.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
“ImageNet Classification with Deep Convolutional Neural
Networks.” Communications of the ACM 60 (6): 84–90. https://dl.acm.org/doi/10.1145/3065386.
LeCun, Yann, and Yoshua Bengio. 1998. “Convolutional Networks for
Images, Speech, and Time Series.” In The Handbook of Brain
Theory and Neural Networks, 255–58. Cambridge, MA, USA: MIT Press.
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W.
Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to
Handwritten Zip Code Recognition.” Neural Computation 1
(4): 541–51. https://doi.org/10.1162/neco.1989.1.4.541.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998.
“Gradient-Based Learning Applied to Document Recognition.”
Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.
Lee, J., J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. 2020.
“Learning Quadrupedal Locomotion over Challenging Terrain.”
Science Robotics 5. https://arxiv.org/abs/2010.11251.
Lei Ba, Jimmy, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016.
“Layer Normalization.” https://arxiv.org/abs/1607.06450v1.
McCulloch, Warren, and Walter Pitts. 1943. “A Logical Calculus of
the Ideas Immanent in Nervous Activity.” Bulletin of
Mathematical Biophysics 5: 115–33. https://link.springer.com/article/10.1007/BF02478259.
Melville, James. 2016. “Nesterov Accelerated Gradient and
Momentum.” https://jlmelville.github.io/mize/nesterov.html.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
“Efficient Estimation of Word Representations in Vector
Space.” https://arxiv.org/abs/1301.3781.
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An
Introduction to Computational Geometry. MIT Press.
Minsky, Marvin, and Seymour A. Papert. 2017. Perceptrons: An Introduction to Computational
Geometry. Reissue of the 1988 Expanded Edition with a new
foreword by Léon Bottou. The MIT Press. https://doi.org/10.7551/mitpress/11301.001.0001.
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013.
“Playing Atari with Deep Reinforcement Learning.” https://arxiv.org/abs/1312.5602.
Nesterov, Yurii. 1983. “A Method for Unconstrained Convex
Minimization Problem with the Rate of Convergence O(1/k2).”
Doklady ANSSSR (Translated as Soviet.Math.Docl.) 269: 543–47.
https://www.mathnet.ru/links/1c9736b46d36bbe2e3d9b558c2d14728/dan46009.pdf.
Nielsen, Michael. 2015. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com.
Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright,
Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language
Models to Follow Instructions with Human Feedback.” In
Advances in Neural Information Processing Systems, 35:27730–44.
https://arxiv.org/abs/2203.02155.
Parr, Terence, and Jeremy Howard. 2018. “The Matrix Calculus You
Need for Deep Learning.” https://arxiv.org/abs/1802.01528.
Qian, Ning. 1999. “On the Momentum Term in Gradient Descent
Learning Algorithms.” Neural Networks 12 (1): 145–51. https://www.columbia.edu/~nq6/publications/momentum.pdf.
Raschka, Sebastian, Yuxi (Hayden) Liu, and Vahid Mirjalili. 2022.
Machine Learning with PyTorch and Scikit-Learn: Develop Machine
Learning and Deep Learning Models with Python. Packt Publishing.
Raschka, Sebastian, and Vahid Mirjalili. 2019. Python Machine
Learning - Machine Learning and Deep Learning with Python, Scikit-Learn,
and TensorFlow 2, 3rd Edition. 3rd ed. Packt Publishing.
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016.
“You Only Look Once: Unified, Real-Time Object Detection.”
In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 779–88. https://arxiv.org/abs/1506.02640.
Redmon, Joseph, and Ali Farhadi. 2017. “YOLO9000: Better, Faster,
Stronger.” In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 6517–25. https://arxiv.org/abs/1612.08242.
———. 2018. “YOLOv3: An Incremental Improvement.” University
of Washington. https://arxiv.org/abs/1804.02767.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2017.
“Faster r-CNN: Towards Real-Time Object Detection with Region
Proposal Networks.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 39 (6): 1137–49. https://arxiv.org/abs/1506.01497.
Rosenblatt, Frank. 1957. “The Perceptron - a Perceiving and
Recognizing Automaton.” Report 85-460-1. Cornell Aeronautical
Laboratory.
———. 1958. “The Perceptron: A Probabilistic Model for Information
Storage and Organization in the Brain.” Psychological
Review 65 (6): 386–408.
Ruder, Sebastian. 2016. “An Overview of Gradient Descent
Optimization Algorithms.” https://arxiv.org/abs/1609.04747.
Rumelhart, D., G. Hinton, and R. Williams. 1986. “Learning
Representations by Back-Propagating Errors.” Nature 323:
533–36. https://doi.org/10.1038/323533a0.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An
Overview.” Neural Networks 61: 85–117. https://arxiv.org/abs/1404.7828.
Sermanet, Pierre, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus,
and Yann LeCun. 2014. “OverFeat: Integrated Recognition,
Localization and Detection Using Convolutional Networks.” In
International Conference on Learning Representations (ICLR).
Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez,
M. Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play
with a General Reinforcement Learning Algorithm.” https://arxiv.org/abs/1712.01815.
Simonyan, Karen, and Andrew Zisserman. 2015. “Very Deep
Convolutional Networks for Large-Scale Image Recognition.” In
International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1409.1556.
Skiena, Steven S. 2017. The Data Science Design Manual.
Springer.
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,
and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent
Neural Networks from Overfitting.” Journal of Machine
Learning Research 15 (56): 1929–58. https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013.
“On the Importance of Initialization and Momentum in Deep
Learning.” In Proceedings of the 30th International
Conference on International Conference on Machine Learning (ICML).
https://www.cs.toronto.edu/~gdahl/papers/momentumNesterovDeepLearning.pdf.
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement
Learning: An Introduction. 2nd ed. Bradford Books.
Trask, Andrew. 2019. Grokking Deep Learning. Manning
Publications.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.
“Attention Is All You Need.” In Advances in Neural
Information Processing Systems, 30:5998–6008. https://arxiv.org/abs/1706.03762.
Werbos, Paul. 1974. “Beyond Regression: New Tools for Prediction
and Analysis in the Behavioral Sciences.” PhD Thesis, Boston, MA:
Harvard University.
———. 1990. “Backpropagation Through Time: What It Does and How to
Do It.” Proceedings of the IEEE 78 (10): 1550–60.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear
Sensitivity Analysis.” In System Modeling and
Optimization, edited by R. F. Drenick and F. Kozin, 762–70.
Springer.
Widrow, Bernard. 1960. “An Adaptive "ADALINE" Neuron Using
Chemical "Mimistors".” Technical Report 1553-2. Solid-State
Electronics Laboratory, Stanford University.
Widrow, Bernard, and Michael A. Lehr. 1990. “30 Years of Adaptive
Neural Networks: Perceptron, Madaline, and Backpropagation.”
Proceedings of the IEEE 78 (9): 1415–42.
Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm
for Continually Running Fully Recurrent Neural Networks.”
Neural Computation 1 (2): 270–80.
Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive Learning Rate
Method.” https://arxiv.org/abs/1212.5701.
Zeiler, Matthew D., and Rob Fergus. 2014. “Visualizing and
Understanding Convolutional Networks.” In Proc. Of the
European Conference on Computer Vision (ECCV), 818–33. https://arxiv.org/abs/1311.2901.
Zhang, Richard, Phillip Isola, and Alexei A. Efros. 2016.
“Colorful Image Colorization.” In European Conference
on Computer Vision. https://arxiv.org/abs/1603.08511.