Literatur
Alammar, Jay, and Maarten Grootendorst. 2024. Hands-on Large
Language Models: Language Understanding and Generation. O’Reilly.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin.
2003. “A Neural Probabilistic Language Model.” Journal
of Machine Learning Research 3: 1137–55. https://dl.acm.org/doi/10.5555/944919.944966.
Bishop, Christopher M. 2007. Pattern Recognition and Machine
Learning. 2nd ed. Springer.
Boyd, Stephen, and Lieven Vandenberghe. 2018. Introduction to
Applied Linear Algebra: Vectors, Matrices, and Least Squares.
Cambridge University Press. https://web.stanford.edu/~boyd/vmls/vmls.pdf.
Brown, Tom B, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan,
Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language
Models Are Few-Shot Learners.” In Proceedings of the 34th
International Conference on Neural Information Processing Systems
(NeurIPS 2020), 1877–1901. https://arxiv.org/abs/2005.14165.
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book.
https://themlbook.com.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua
Bengio. 2014. “On the Properties of Neural Machine Translation:
Encoder–Decoder Approaches.” In Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure in Statistical
Translation, 103–11. https://arxiv.org/abs/1409.1259.
Chollet, François. 2021. Deep Learning with Python, 2nd
Edition. 2nd ed. Manning.
Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio.
2014. “Empirical Evaluation of Gated Recurrent Neural Networks on
Sequence Modeling.” In NIPS 2014 Workshop on Deep
Learning. https://arxiv.org/abs/1412.3555.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018.
“BERT: Pre-Training of Deep Bidirectional Transformers for
Language Understanding.” https://arxiv.org/abs/1810.04805.
Donahue, Jeff, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang,
Eric Tzeng, and Trevor Darrell. 2014. “DeCAF: A Deep Convolutional
Activation Feature for Generic Visual Recognition.” In
Proceedings of the 31st International Conference on International
Conference on Machine Learning - Volume 32. ICML’14. Beijing,
China.
Dozat, Timothy. 2016. “Incorporating Nesterov Momentum into
Adam.” In Proceedings of the 4th International Conference on
Learning Representations. https://cs229.stanford.edu/proj2015/054_report.pdf.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive
Subgradient Methods for Online Learning and Stochastic
Optimization.” Journal of Machine Learning Research 12:
2121–59. https://dl.acm.org/doi/10.5555/1953048.2021068.
Ekman, Magnus. 2021. Learning Deep Learning: Theory and Practice of
Neural Networks, Computer Vision, Natural Language Processing, and
Transformers Using TensorFlow. Addison Wesley.
Elman, Jeffrey L. 1990. “Finding Structure in Time.”
Cognitive Science 14: 179–211.
Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural
Network Model for a Mechanism of Pattern Recognition Unaffected by Shift
in Position.” Biological Cybernetics 36 (4): 193–202.
Fukushima, Kunihiko, Sei Miyake, and Takayuki Ito. 1983.
“Neocognitron: A Neural Network Model for a Mechanism of Visual
Pattern Recognition.” IEEE Transactions on Systems, Man, and
Cybernetics SMC-13 (5): 826–34.
Gallatin, Kyle, and Chris Albon. 2023. Python Machine Learning
Cookbook: Practical Solutions from Preprocessing to Deep Learning.
2nd ed. O’Reilly Media.
Gao, W., L. Graesser, K. Choromanski, X. Song, N. Lazic, P. R. Sanketi,
V. Sindhwani, and N. Jaitly. 2020. “Robotic Table Tennis with
Model-Free Reinforcement Learning.” In IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), 5556–63. https://arxiv.org/abs/2003.14398.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems. 2nd ed. O’Reilly Media.
Girshick, Ross. 2015. “Fast r-CNN.” In Proc. Of the
IEEE International Conference on Computer Vision (ICCV), 1440–48.
Santiago. https://arxiv.org/abs/1504.08083.
Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014.
“Rich Feature Hierarchies for Accurate Object Detection and
Semantic Segmentation.” In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 580–87. https://arxiv.org/abs/1311.2524.
Goh, Gabriel. 2017. “Why Momentum Really Works.” https://distill.pub/2017/momentum/.
Graves, Alex. 2014. “Generating Sequences with Recurrent Neural
Networks.” https://arxiv.org/abs/1308.0850.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and
Jürgen Schmidhuber. 2017. “LSTM: A Search Space
Odyssey.” IEEE Transactions on Neural Networks and Learning
Systems 28 (10): 2222–32. https://arxiv.org/abs/1503.04069.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016.
“Deep Residual Learning for Image Recognition.” In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
770–78. https://doi.org/10.1109/CVPR.2016.90.
Herculano-Houzel, Suzana. 2009. “The Human Brain in Numbers: A
Linearly Scaled-up Primate Brain.” Frontiers in Human
Neuroscience 3: 31.
Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever,
and Ruslan Salakhutdinov. 2012. “Improving Neural Networks by
Preventing Co-Adaptation of Feature Detectors.” https://arxiv.org/abs/1207.0580.
Hochreiter, Sepp. 1991. “Untersuchungen Zu Dynamischen Neuronalen
Netzen.” Diplomarbeit, Institut für Informatik, Technische
Universität München. https://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term
Memory.” Neural Computation 9 (8): 1735–80. https://dl.acm.org/doi/10.1162/neco.1997.9.8.1735.
Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language
Model Fine-Tuning for Text Classification.” In Proceedings of
the 56th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), 328–39. https://arxiv.org/abs/1801.06146.
HuggingFace. 2022. “The Hugging Face Course, 2022.” https://huggingface.co/course.
Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate
Shift.” In Proceedings of the 32nd International Conference
on International Conference on Machine Learning (ICML), 448–56. https://arxiv.org/abs/1502.03167.
Jelinek, Frederick. 1997. Statistical Methods for Speech
Recognition. MIT Press.
Jordan, Michael I. 1990. “Attractor Dynamics and Parallelism in a
Connectionist Sequential Machine.” In Artificial Neural
Networks: Concept Learning, 112–27. IEEE Press.
Jurafsky, Daniel, and James H. Martin. 2008. Speech and Language
Processing: An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition. Prentice Hall.
Khan, A., A. Sohail, U. Zahoora, and A. S. Qureshi. 2020. “A
Survey of the Recent Architectures of Deep Convolutional Neural
Networks.” Artificial Intelligence Review 53: 5455–5516.
https://arxiv.org/abs/1901.06032.
Kingma, Diederik P., and Jimmy Lei Ba. 2015. “Adam: A Method for
Stochastic Optimization.” In International Conference on
Learning Representations, 1–13. https://arxiv.org/abs/1412.6980.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
“ImageNet Classification with Deep Convolutional Neural
Networks.” Communications of the ACM 60 (6): 84–90. https://dl.acm.org/doi/10.1145/3065386.
LeCun, Yann, and Yoshua Bengio. 1998. “Convolutional Networks for
Images, Speech, and Time Series.” In The Handbook of Brain
Theory and Neural Networks, 255–58. Cambridge, MA, USA: MIT Press.
LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W.
Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to
Handwritten Zip Code Recognition.” Neural Computation 1
(4): 541–51. https://doi.org/10.1162/neco.1989.1.4.541.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998.
“Gradient-Based Learning Applied to Document Recognition.”
Proceedings of the IEEE 86 (11): 2278–2324. https://doi.org/10.1109/5.726791.
Lee, J., J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter. 2020.
“Learning Quadrupedal Locomotion over Challenging Terrain.”
Science Robotics 5. https://arxiv.org/abs/2010.11251.
Lei Ba, Jimmy, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016.
“Layer Normalization.” https://arxiv.org/abs/1607.06450v1.
McCulloch, Warren, and Walter Pitts. 1943. “A Logical Calculus of
the Ideas Immanent in Nervous Activity.” Bulletin of
Mathematical Biophysics 5: 115–33. https://link.springer.com/article/10.1007/BF02478259.
Melville, James. 2016. “Nesterov Accelerated Gradient and
Momentum.” https://jlmelville.github.io/mize/nesterov.html.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
“Efficient Estimation of Word Representations in Vector
Space.” https://arxiv.org/abs/1301.3781.
Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. “Linguistic
Regularities in Continuous Space Word Representations.” In
Proceedings of the 2013 Conference of the North
American Chapter of the Association for Computational
Linguistics: Human Language Technologies, 746–51. Atlanta, Georgia:
Association for Computational Linguistics. https://aclanthology.org/N13-1090.
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An
Introduction to Computational Geometry. MIT Press.
Minsky, Marvin, and Seymour A. Papert. 2017. Perceptrons: An Introduction to Computational
Geometry. Reissue of the 1988 Expanded Edition with a new
foreword by Léon Bottou. The MIT Press. https://doi.org/10.7551/mitpress/11301.001.0001.
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013.
“Playing Atari with Deep Reinforcement Learning.” https://arxiv.org/abs/1312.5602.
Nesterov, Yurii. 1983. “A Method for Unconstrained Convex
Minimization Problem with the Rate of Convergence O(1/k2).”
Doklady ANSSSR (Translated as Soviet.Math.Docl.) 269: 543–47.
https://www.mathnet.ru/links/1c9736b46d36bbe2e3d9b558c2d14728/dan46009.pdf.
Nielsen, Michael. 2015. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com.
Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright,
Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language
Models to Follow Instructions with Human Feedback.” In
Advances in Neural Information Processing Systems, 35:27730–44.
https://arxiv.org/abs/2203.02155.
Parr, Terence, and Jeremy Howard. 2018. “The Matrix Calculus You
Need for Deep Learning.” https://arxiv.org/abs/1802.01528.
Qian, Ning. 1999. “On the Momentum Term in Gradient Descent
Learning Algorithms.” Neural Networks 12 (1): 145–51. https://www.columbia.edu/~nq6/publications/momentum.pdf.
Radford, Alec, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever.
2018. “Improving Language Understanding by Generative
Pre-Training.” arXiv Preprint arXiv:1802.05365.
Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and
Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask
Learners.” OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
Raschka, Sebastian. 2024. Build a Large Language Model from
Scratch. Manning Publications.
Raschka, Sebastian, Yuxi (Hayden) Liu, and Vahid Mirjalili. 2022.
Machine Learning with PyTorch and Scikit-Learn: Develop Machine
Learning and Deep Learning Models with Python. Packt Publishing.
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016.
“You Only Look Once: Unified, Real-Time Object Detection.”
In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 779–88. https://arxiv.org/abs/1506.02640.
Redmon, Joseph, and Ali Farhadi. 2017. “YOLO9000: Better, Faster,
Stronger.” In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 6517–25. https://arxiv.org/abs/1612.08242.
———. 2018. “YOLOv3: An Incremental Improvement.” University
of Washington. https://arxiv.org/abs/1804.02767.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2017.
“Faster r-CNN: Towards Real-Time Object Detection with Region
Proposal Networks.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 39 (6): 1137–49. https://arxiv.org/abs/1506.01497.
Rosenblatt, Frank. 1957. “The Perceptron - a Perceiving and
Recognizing Automaton.” Report 85-460-1. Cornell Aeronautical
Laboratory.
———. 1958. “The Perceptron: A Probabilistic Model for Information
Storage and Organization in the Brain.” Psychological
Review 65 (6): 386–408.
Ruder, Sebastian. 2016. “An Overview of Gradient Descent
Optimization Algorithms.” https://arxiv.org/abs/1609.04747.
Rumelhart, D., G. Hinton, and R. Williams. 1986. “Learning
Representations by Back-Propagating Errors.” Nature 323:
533–36. https://doi.org/10.1038/323533a0.
Schmidhuber, Jürgen. 2015. “Deep Learning in Neural Networks: An
Overview.” Neural Networks 61: 85–117. https://arxiv.org/abs/1404.7828.
Sermanet, Pierre, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus,
and Yann LeCun. 2014. “OverFeat: Integrated Recognition,
Localization and Detection Using Convolutional Networks.” In
International Conference on Learning Representations (ICLR).
Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez,
M. Lanctot, et al. 2017. “Mastering Chess and Shogi by Self-Play
with a General Reinforcement Learning Algorithm.” https://arxiv.org/abs/1712.01815.
Simonyan, Karen, and Andrew Zisserman. 2015. “Very Deep
Convolutional Networks for Large-Scale Image Recognition.” In
International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1409.1556.
Skiena, Steven S. 2017. The Data Science Design Manual.
Springer.
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,
and Ruslan Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent
Neural Networks from Overfitting.” Journal of Machine
Learning Research 15 (56): 1929–58. https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf.
Sutskever, Ilya, James Martens, George Dahl, and Geoffrey Hinton. 2013.
“On the Importance of Initialization and Momentum in Deep
Learning.” In Proceedings of the 30th International
Conference on International Conference on Machine Learning (ICML).
https://www.cs.toronto.edu/~gdahl/papers/momentumNesterovDeepLearning.pdf.
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement
Learning: An Introduction. 2nd ed. Bradford Books.
Trask, Andrew. 2019. Grokking Deep Learning. Manning
Publications.
Tunstall, Lewis, Leandro von Werra, and Thomas Wolf. 2022. Natural
Language Processing with Transformers: Building Language Applications
with Hugging Face. O’Reilly Media.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.
“Attention Is All You Need.” In Advances in Neural
Information Processing Systems, 30:5998–6008. https://arxiv.org/abs/1706.03762.
Werbos, Paul. 1974. “Beyond Regression: New Tools for Prediction
and Analysis in the Behavioral Sciences.” PhD Thesis, Boston, MA:
Harvard University.
———. 1990. “Backpropagation Through Time: What It Does and How to
Do It.” Proceedings of the IEEE 78 (10): 1550–60.
Werbos, Paul J. 1982. “Applications of Advances in Nonlinear
Sensitivity Analysis.” In System Modeling and
Optimization, edited by R. F. Drenick and F. Kozin, 762–70.
Springer.
Widrow, Bernard. 1960. “An Adaptive "ADALINE" Neuron Using
Chemical "Mimistors".” Technical Report 1553-2. Solid-State
Electronics Laboratory, Stanford University.
Widrow, Bernard, and Michael A. Lehr. 1990. “30 Years of Adaptive
Neural Networks: Perceptron, Madaline, and Backpropagation.”
Proceedings of the IEEE 78 (9): 1415–42.
Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm
for Continually Running Fully Recurrent Neural Networks.”
Neural Computation 1 (2): 270–80.
Yang, Jingfeng, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng,
Haoming Jiang, Bing Yin, and Xia Hu. 2023. “Harnessing the Power
of LLMs in Practice: A Survey on ChatGPT and Beyond.” https://arxiv.org/abs/2304.13712.
Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014.
“How Transferable Are Features in Deep Neural Networks?” In
Proceedings of the 27th International Conference on Neural
Information Processing Systems - Volume 2, 3320–28. NIPS’14.
Cambridge, MA, USA: MIT Press. https://dl.acm.org/doi/10.5555/2969033.2969197.
Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive Learning Rate
Method.” https://arxiv.org/abs/1212.5701.
Zeiler, Matthew D., and Rob Fergus. 2014. “Visualizing and
Understanding Convolutional Networks.” In Proc. Of the
European Conference on Computer Vision (ECCV), 818–33. https://arxiv.org/abs/1311.2901.
Zhang, Richard, Phillip Isola, and Alexei A. Efros. 2016.
“Colorful Image Colorization.” In European Conference
on Computer Vision. https://arxiv.org/abs/1603.08511.