A roadmap for software engineers using AI
Software engineer? Interested in AI? Here is a roadmap showing different levels of increasingly advanced experience with AI.
AI is presenting a vast world of opportunities for innovation. The future software engineer will almost certainly have some working knowledge of AI technology.
But it’s vast. Machine learning has a different stack of concepts and technology to conventional software engineering: it includes datasets, training, prediction models (mostly deep neural networks), inference and model deployment. It is data-centric and often demands specialist hardware (GPUs, AI accelerators). Opportunities for innovation in machine learning are abound, and many of them benefit from a good solid understanding of the mathematics behind it. Here are a but a few:
Innovation in prompt engineering: the natural language prompts sent to AI service APIs can significantly impact the performance of the results in ways that are completely new to conventional APIs.
Innovation in prediction model optimisation: there are a growing number of tools that can be used that can reduce the size of a model (e.g. quantisation and distillation), improve its quality (e.g. retrieval-augmented generation, fine-tuning), and even improve its performance (such as C++ / CPU-based solutions)
Innovation in improvement of the building blocks: FlashAttention is an example of iteration on transformers that looks to improve the performance significantly by deeply optimising for how GPU memory works.
Innovation in new building blocks for neural networks: Transformers (with Attention) have changed the world, enabling large language models. Mamba looks to be the next major iteration on transformers because it’s significantly more efficient (lower computational complexity).
Innovation in data sourcing, ethics (including human/AI alignment) and provenance: synthetic data generation, better learning and fine-tuning approaches for AIs that are aligned with humanity.
Innovation in machine learning approaches: deep neural networks (DNNs) and reinforcement learning are vogue today, but other areas are showing promise likely in tandem with DNNs that include planning, spiked neural networks (“biologically-plausable” networks), capsule networks (Hinton’s work on “next-gen convolutional neural networks”)
As such, a software engineer may find the prospect of engaging with AI fairly daunting. But the reality is, it’s all a matter of degree and can be taken on project by project over time. Below is a simple multi-level structure with increasing levels of sophistication and what a software engineer might do.
Level 1: API as a service. Use a service that happens to use AI, though a conventional structured API. E.g. HuggingFace Whisper is a service that takes audio files and returns transcriptions in JSON. No AI knowledge required.
Level 2: Prompt-based API as a service. Use AI through a prompt-based API. Knowledge of natural language prompts (“prompt engineering”) required. E.g. OpenAI’s conversational API: the API call contains natural language prompts. Prompt design and knowledge of processing of results specific to that particular model and API needs to be done. No knowledge of ML frameworks like pytorch or tensorflow needed.
Level 3: changing existing AI applications: working with existing applications that use pytorch, tensorflow, jax etc. For instance, deforum is an open source tool that uses pytorch to create animated images created with generative AI. To use the library you need to understand nearly 150 different parameters that are used to decide how images are created, and many of them assume knowledge of how neural networks operate (e.g. temperature, grad_inject_timing, kernel schedule).
Level 4: building your own neural (nontrivial) network. Here you’re writing pytorch/tensorflow/jax code to construct a model architecture yourself using building blocks such as “Transformer” and “Fully Connected” etc. See tutorial on this for image segmentation. You need to understand how neural networks work and how to train them. This is firmly in the “machine learning/deep learning engineer” space.
Level 5: building your own neural network components, possibly inspired by research papers. E.g. the paper “1-bit transformers” introduces some new building blocks. See here an implementation of the paper by Kye Gomez that includes new components including: a
BitLinear
layer,BitNetTransformer
, andBitFeedForward
. On top of a solid working knowledge of how neural networks operate, a deep, detailed understanding is needed of the maths behind neural networks and how to design, build and test them.Level 6: your own research. The output of this role is primarily academic papers presenting new algorithms. Reference implementations are more often that not included but not always.
What do you think? This is v1 of “how to start using AI as a software engineer”. I’m sure I’m missing a few pieces and welcome other’s input.