Literatur

Alammar, Jay, and Maarten Grootendorst. 2024. Hands-on Large Language Models: Language Understanding and Generation. O’Reilly.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. “A Neural Probabilistic Language Model.” Journal of Machine Learning Research 3: 1137–55. https://dl.acm.org/doi/10.5555/944919.944966.
Bishop, Christopher M. 2007. Pattern Recognition and Machine Learning. 2nd ed. Springer.
Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge University Press. https://web.stanford.edu/~boyd/vmls/vmls.pdf.
Brown, Tom B, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020), 1877–1901. https://arxiv.org/abs/2005.14165.
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book. https://themlbook.com.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches.” In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 103–11. https://arxiv.org/abs/1409.1259.
Chollet, François. 2021. Deep Learning with Python, 2nd Edition. 2nd ed. Manning.
Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” In NIPS 2014 Workshop on Deep Learning. https://arxiv.org/abs/1412.3555.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” https://arxiv.org/abs/1810.04805.
Donahue, Jeff, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14. Beijing, China.
Dozat, Timothy. 2016. “Incorporating Nesterov Momentum into Adam.” In Proceedings of the 4th International Conference on Learning Representations. https://cs229.stanford.edu/proj2015/054_report.pdf.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12: 2121–59. https://dl.acm.org/doi/10.5555/1953048.2021068.
Ekman, Magnus. 2021. Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow. Addison Wesley.
Elman, Jeffrey L. 1990. “Finding Structure in Time.” Cognitive Science 14: 179–211.
Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biological Cybernetics 36 (4): 193–202.
Fukushima, Kunihiko, Sei Miyake, and Takayuki Ito. 1983. “Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition.” IEEE Transactions on Systems, Man, and Cybernetics SMC-13 (5): 826–34.
Gallatin, Kyle, and Chris Albon. 2023. Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning. 2nd ed. O’Reilly Media.
Gao, W., L. Graesser, K. Choromanski, X. Song, N. Lazic, P. R. Sanketi, V. Sindhwani, and N. Jaitly. 2020. “Robotic Table Tennis with Model-Free Reinforcement Learning.” In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5556–63. https://arxiv.org/abs/2003.14398.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 2nd ed. O’Reilly Media.
Girshick, Ross. 2015. “Fast r-CNN.” In Proc. Of the IEEE International Conference on Computer Vision (ICCV), 1440–48. Santiago. https://arxiv.org/abs/1504.08083.
Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580–87. https://arxiv.org/abs/1311.2524.
Goh, Gabriel. 2017. “Why Momentum Really Works.” https://distill.pub/2017/momentum/.
Graves, Alex. 2014. “Generating Sequences with Recurrent Neural Networks.” https://arxiv.org/abs/1308.0850.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28 (10): 2222–32. https://arxiv.org/abs/1503.04069.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual Learning for Image Recognition.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. https://doi.org/10.1109/CVPR.2016.90.
Herculano-Houzel, Suzana. 2009. “The Human Brain in Numbers: A Linearly Scaled-up Primate Brain.” Frontiers in Human Neuroscience 3: 31.
Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2012. “Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors.” https://arxiv.org/abs/1207.0580.
Hochreiter, Sepp. 1991. “Untersuchungen Zu Dynamischen Neuronalen Netzen.” Diplomarbeit, Institut für Informatik, Technische Universität München. https://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80. https://dl.acm.org/doi/10.1162/neco.1997.9.8.1735.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language Model Fine-Tuning for Text Classification.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 328–39. https://arxiv.org/abs/1801.06146.
HuggingFace. 2022. “The Hugging Face Course, 2022.” https://huggingface.co/course.
Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), 448–56. https://arxiv.org/abs/1502.03167.
Jelinek, Frederick. 1997. Statistical Methods for Speech Recognition. MIT Press.
Jordan, Michael I. 1990. “Attractor Dynamics and Parallelism in a Connectionist Sequential Machine.” In Artificial Neural Networks: Concept Learning, 112–27. IEEE Press.
Jurafsky, Daniel, and James H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall.
Khan, A., A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. “A Survey of the Recent Architectures of Deep Convolutional Neural Networks.” Artificial Intelligence Review 53: 5455–5516. https://arxiv.org/abs/1901.06032.
Kingma, Diederik P., and Jimmy Lei Ba. 2015. “Adam: A Method for Stochastic Optimization.” In International Conference on Learning Representations, 1–13. https://arxiv.org/abs/1412.6980.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6): 84–90. https://dl.acm.org/doi/10.1145/3065386.
LeCun, Yann, and Yoshua Bengio. 1998. “Convolutional Networks for Images, Speech, and Time Series.” In The Handbook of Brain Theory and Neural Networks, 255–58. Cambridge, MA, USA: MIT Press.
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation 1 (4): 541–51. https://doi.org/10.1162/neco.1989.1.4.541.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.
Lee, J., J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. 2020. “Learning Quadrupedal Locomotion over Challenging Terrain.” Science Robotics 5. https://arxiv.org/abs/2010.11251.
Lei Ba, Jimmy, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. “Layer Normalization.” https://arxiv.org/abs/1607.06450v1.
McCulloch, Warren, and Walter Pitts. 1943. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics 5: 115–33. https://link.springer.com/article/10.1007/BF02478259.
Melville, James. 2016. “Nesterov Accelerated Gradient and Momentum.” https://jlmelville.github.io/mize/nesterov.html.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” https://arxiv.org/abs/1301.3781.
Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations.” In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–51. Atlanta, Georgia: Association for Computational Linguistics. https://aclanthology.org/N13-1090.
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. MIT Press.
Minsky, Marvin, and Seymour A. Papert. 2017. Perceptrons: An Introduction to Computational Geometry. Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou. The MIT Press. https://doi.org/10.7551/mitpress/11301.001.0001.
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.” https://arxiv.org/abs/1312.5602.
Nesterov, Yurii. 1983. “A Method for Unconstrained Convex Minimization Problem with the Rate of Convergence O(1/k2).” Doklady ANSSSR (Translated as Soviet.Math.Docl.) 269: 543–47. https://www.mathnet.ru/links/1c9736b46d36bbe2e3d9b558c2d14728/dan46009.pdf.
Nielsen, Michael. 2015. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com.
Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” In Advances in Neural Information Processing Systems, 35:27730–44. https://arxiv.org/abs/2203.02155.
Parr, Terence, and Jeremy Howard. 2018. “The Matrix Calculus You Need for Deep Learning.” https://arxiv.org/abs/1802.01528.
Qian, Ning. 1999. “On the Momentum Term in Gradient Descent Learning Algorithms.” Neural Networks 12 (1): 145–51. https://www.columbia.edu/~nq6/publications/momentum.pdf.
Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-Training.” arXiv Preprint arXiv:1802.05365.
Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
Raschka, Sebastian. 2024. Build a Large Language Model from Scratch. Manning Publications.
Raschka, Sebastian, Yuxi (Hayden) Liu, and Vahid Mirjalili. 2022. Machine Learning with PyTorch and Scikit-Learn: Develop Machine Learning and Deep Learning Models with Python. Packt Publishing.
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. “You Only Look Once: Unified, Real-Time Object Detection.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–88. https://arxiv.org/abs/1506.02640.
Redmon, Joseph, and Ali Farhadi. 2017. “YOLO9000: Better, Faster, Stronger.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517–25. https://arxiv.org/abs/1612.08242.
———. 2018. “YOLOv3: An Incremental Improvement.” University of Washington. https://arxiv.org/abs/1804.02767.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2017. “Faster r-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6): 1137–49. https://arxiv.org/abs/1506.01497.
Rosenblatt, Frank. 1957. “The Perceptron - a Perceiving and Recognizing Automaton.” Report 85-460-1. Cornell Aeronautical Laboratory.
———. 1958. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65 (6): 386–408.
Ruder, Sebastian. 2016. “An Overview of Gradient Descent Optimization Algorithms.” https://arxiv.org/abs/1609.04747.
Rumelhart, D., G. Hinton, and R. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323: 533–36. https://doi.org/10.1038/323533a0.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An Overview.” Neural Networks 61: 85–117. https://arxiv.org/abs/1404.7828.
Sermanet, Pierre, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. “OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks.” In International Conference on Learning Representations (ICLR).
Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” https://arxiv.org/abs/1712.01815.
Simonyan, Karen, and Andrew Zisserman. 2015. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1409.1556.
Skiena, Steven S. 2017. The Data Science Design Manual. Springer.
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research 15 (56): 1929–58. https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML). https://www.cs.toronto.edu/~gdahl/papers/momentumNesterovDeepLearning.pdf.
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. Bradford Books.
Trask, Andrew. 2019. Grokking Deep Learning. Manning Publications.
Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. 2022. Natural Language Processing with Transformers: Building Language Applications with Hugging Face. O’Reilly Media.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems, 30:5998–6008. https://arxiv.org/abs/1706.03762.
Werbos, Paul. 1974. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.” PhD Thesis, Boston, MA: Harvard University.
———. 1990. “Backpropagation Through Time: What It Does and How to Do It.” Proceedings of the IEEE 78 (10): 1550–60.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear Sensitivity Analysis.” In System Modeling and Optimization, edited by R. F. Drenick and F. Kozin, 762–70. Springer.
Widrow, Bernard. 1960. “An Adaptive "ADALINE" Neuron Using Chemical "Mimistors".” Technical Report 1553-2. Solid-State Electronics Laboratory, Stanford University.
Widrow, Bernard, and Michael A. Lehr. 1990. “30 Years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation.” Proceedings of the IEEE 78 (9): 1415–42.
Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation 1 (2): 270–80.
Yang, Jingfeng, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia Hu. 2023. “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.” https://arxiv.org/abs/2304.13712.
Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3320–28. NIPS’14. Cambridge, MA, USA: MIT Press. https://dl.acm.org/doi/10.5555/2969033.2969197.
Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive Learning Rate Method.” https://arxiv.org/abs/1212.5701.
Zeiler, Matthew D., and Rob Fergus. 2014. “Visualizing and Understanding Convolutional Networks.” In Proc. Of the European Conference on Computer Vision (ECCV), 818–33. https://arxiv.org/abs/1311.2901.
Zhang, Richard, Phillip Isola, and Alexei A. Efros. 2016. “Colorful Image Colorization.” In European Conference on Computer Vision. https://arxiv.org/abs/1603.08511.