Wednesday, June 3, 2026

OFFENSIVE SECURITY / MALWARE ANALYSIS / REVERSE ENGINEERING Concept Reference List — Complete Edition

================================================================
RECOMMENDED LEARNING PATH
================================================================
1. C Programming
2. Assembly (x86/x64)
3. PE Format & Windows Internals
4. Debugging & Dynamic Analysis
5. Reverse Engineering
6. Shellcode Engineering
7. Exploit Development
8. Malware Internals & Code Injection
9. EDR Evasion Concepts
10. Kernel Mode Programming
11. Active Directory Tradecraft
12. Firmware / Hypervisor Research

================================================================
SECTION A — FOUNDATIONS
================================================================

----------------------------------------------------------------
A1. PE FILE INTERNALS
----------------------------------------------------------------
- DOS Header / NT Headers
Every PE starts with IMAGE_DOS_HEADER (MZ magic), then IMAGE_NT_HEADERS
containing the file and optional headers

- Section Headers & Alignment
.text (code), .data, .rdata, .rsrc — each has raw vs virtual alignment

- Import Table (IAT / INT)
List of DLLs and functions the binary needs; resolved by the loader at startup

- Export Table
Functions a DLL exposes to callers; has name, ordinal, and address arrays

- Relocations
Base relocation table used when image can't load at preferred base address

- TLS Callbacks
Thread Local Storage callbacks run BEFORE the entry point — common anti-debug
trick since many debuggers break at EP, not TLS

- Delayed Imports
Imports resolved lazily at first call rather than at load time

- Forwarded Exports
An export that redirects to a function in another DLL
(e.g., kernel32!Beep -> kernelbase!Beep)

- Resource Section (.rsrc)
Embedded resources: icons, strings, version info, and sometimes payloads

- Manual Mapping
Parsing and loading a PE by hand: map sections, fix relocations, resolve IAT,
call TLS callbacks, then call entry point — foundation of reflective loading

- Relocation Fixups
Patching absolute addresses when image loads at a different base than preferred

----------------------------------------------------------------
A2. WINDOWS INTERNALS
----------------------------------------------------------------
- Object Manager
Kernel subsystem managing all named/unnamed kernel objects
(files, events, mutexes, processes, threads)

- Handle Tables
Per-process table mapping handle values to kernel object pointers

- Access Tokens & Security Reference Monitor (SRM)
Tokens carry user SID, group SIDs, privileges; SRM enforces access checks

- ALPC (Advanced Local Procedure Call)
High-performance IPC mechanism used internally by Windows (replaces LPC)

- Executive & Kernel layers
HAL -> Kernel -> Executive (Ob, Mm, Io, Se, Ps, etc.) -> Subsystems

- Virtual Memory Manager (VMM)
Manages VADs, page tables, working sets, paged/non-paged pool

- I/O Manager & IRP
Manages driver stack communication via I/O Request Packets

- Session & Desktop isolation
Sessions separate user contexts; desktops isolate window stations

----------------------------------------------------------------
A3. SHELLCODE ENGINEERING
----------------------------------------------------------------
- Position-Independent Code (PIC)
Code that works regardless of where it's loaded — no hardcoded addresses;
uses delta offsets or GetPC techniques

- GetPC Techniques
Getting the current instruction pointer value at runtime
(e.g., CALL/POP trick, LEA RIP-relative on x64)

- Null-Byte Avoidance
Many injection vectors treat 0x00 as string terminator; shellcode must
avoid null bytes through instruction substitution

- Encoder / Decoder Stubs
XOR, ROT, or custom encoders wrap shellcode; decoder runs first,
decodes in-place, then jumps to payload

- Syscall Shellcode
Shellcode that invokes syscalls directly without relying on API stubs

- Alphanumeric Shellcode
Shellcode restricted to printable ASCII characters — bypasses filters
that only allow text input

- Egg Hunters
Small shellcode that searches process memory for a unique tag (egg)
preceding the real payload — useful when injection space is limited

- Staged vs Stageless Payloads
Stageless: entire payload in one blob
Staged: small stager downloads and executes the real payload from a C2

- Stack Pivoting
Redirect the stack pointer (RSP/ESP) to attacker-controlled memory
to enable ROP chain execution

- ROP Chains (Return-Oriented Programming)
Chain together existing code "gadgets" (ending in RET) to execute
arbitrary logic without injecting new code — bypasses DEP/NX

================================================================
SECTION B — EXPLOITATION
================================================================

----------------------------------------------------------------
B1. EXPLOIT DEVELOPMENT
----------------------------------------------------------------
- Buffer Overflow (Stack)
Overwrite return address on the stack to redirect execution

- Buffer Overflow (Heap)
Corrupt heap metadata or adjacent allocations to gain control

- Use-After-Free (UAF)
Access memory after it has been freed; if reallocated with attacker
data, leads to type confusion or code execution

- Heap Corruption
Corrupt allocator metadata (free lists, chunk headers) to redirect writes

- Format String Vulnerabilities
Uncontrolled format strings (%n, %x) allow arbitrary read/write

- Integer Overflows / Underflows
Arithmetic wrapping leads to incorrect size calculations and
exploitable allocations

- Race Conditions (TOCTOU)
Time-of-check vs time-of-use: win a race between check and use
to substitute a different resource

- DEP / NX Bypass
Data Execution Prevention marks memory non-executable;
bypassed via ROP, ret2libc, or JIT spraying

- ASLR Bypass
Address Space Layout Randomization randomized base addresses;
bypassed via info leaks, partial overwrites, heap spraying, or brute force

- ROP / JOP / COP
Return/Jump/Call Oriented Programming — code reuse attack variants

- Heap Feng Shui
Carefully shape heap layout to place attacker data adjacent to
target structures before triggering a vulnerability

- SEH Exploitation (Windows)
Overwrite Structured Exception Handler chain to redirect execution
on exception

- Browser Exploitation Concepts
JIT compiler abuse, sandbox escapes, type confusion in JS engines,
renderer vs browser process privilege separation

- Kernel Exploitation Basics
NULL pointer dereference, pool overflows, race conditions in drivers,
token stealing shellcode to escalate to SYSTEM

================================================================
SECTION C — MALWARE INTERNALS
================================================================

----------------------------------------------------------------
C1. PROCESS & MEMORY INTERNALS
----------------------------------------------------------------
- Process Hollowing
Spawn a legit process suspended, hollow out its memory, replace with payload

- Process Doppelganging
Use NTFS transactions to load a modified executable without touching disk

- Process Herpaderping
Map an executable image, modify it on disk after mapping but before
section validation — confuses scanners that scan from disk

- Process Ghosting
Create a file, mark it for deletion, map it as an image, then run it —
appears to run from an already-deleted file

- PEB Walking
Manually find loaded modules via the Process Environment Block (no API calls)

- VAD Manipulation
Tamper with Virtual Address Descriptors to hide memory regions

- Page Table Manipulation
Directly manipulate page tables at a lower level than VAD tricks

- Heap Spraying
Fill heap with shellcode to increase odds of hitting it on overflow

- Pool Spraying
Kernel-mode equivalent of heap spraying; targets kernel pool allocations

- EXE Packing (Custom Packer)
Compress/encrypt an executable; stub decompresses and runs it at runtime

- DLL Memory Loading (Reflective DLL Injection)
Load a DLL from a byte buffer in memory instead of from disk

- Thread Hijacking
Suspend an existing thread, redirect its instruction pointer, resume it

- Memory Patching
Overwrite bytes in a running process to change its behavior

----------------------------------------------------------------
C2. HOOKING TECHNIQUES
----------------------------------------------------------------
- Inline Hooking
Patch first 5 bytes of a function with a JMP to your handler

- Trampoline Hooks
Inline hook that also preserves and calls the original function

- Detours-style Hook
Microsoft Detours approach — robust inline hook with trampoline

- IAT Hooking
Replace function pointers in the Import Address Table

- VTable Hooking
Overwrite C++ virtual function table pointers

- GOT/PLT Hooking (Linux)
Overwrite Global Offset Table entries to redirect function calls

- SSDT Hooking
Hook the kernel's System Service Descriptor Table (kernel mode)

- Kernel Callback Hooking
Tamper with PsSetCreateProcessNotifyRoutine and similar callbacks
to blind EDR/AV kernel drivers

- IRP Hooking
Hook I/O Request Packets in kernel drivers

- SYSENTER / SYSCALL Hooking
Modify MSRs to intercept syscall entry point

----------------------------------------------------------------
C3. CODE INJECTION TECHNIQUES
----------------------------------------------------------------
- Classic DLL Injection
WriteProcessMemory + CreateRemoteThread -> LoadLibrary

- APC Injection
Queue an Async Procedure Call to a thread's APC queue

- Early Bird Injection
Inject via APC before the process fully initializes

- SetThreadContext Injection
Redirect a suspended thread's context registers to shellcode

- Fiber Injection
Hijack user-mode fibers to execute code inside a target process

- Transacted Hollowing
Variant of Doppelganging using TxF (Transactional NTFS)

- Heaven's Gate
Switch from 32-bit to 64-bit mode mid-execution to bypass hooks

- Atom Bombing
Use Windows global atom tables as a data smuggling channel

- ptrace Injection (Linux)
Use ptrace() syscall to read/write memory and registers of a process

- LD_PRELOAD Hijacking (Linux)
Force a process to load your shared library before all others

----------------------------------------------------------------
C4. EVASION & ANTI-ANALYSIS
----------------------------------------------------------------
- API Unhooking
Restore ntdll from a clean copy to remove AV/EDR hooks

- Direct Syscalls
Invoke syscalls by number, bypassing hooked user-mode API stubs

- Indirect Syscalls
JMP into ntdll's syscall instruction to avoid non-module execution

- Syscall Stomping
Overwrite an unused syscall stub with your own to blend in

- Unhooking via KnownDlls Cache
Load clean ntdll from the KnownDlls section object

- ETW Patching
Patch ETW to blind event logging and telemetry

- Call Stack Spoofing / Return Address Spoofing
Fake the call stack to hide the real caller from EDR stack walking

- Sleep Obfuscation
Encrypt shellcode in memory while sleeping to evade memory scanning

- Stack Encryption
Encrypt the stack during sleep/wait periods

- Gargoyle Memory Hiding
Mark shellcode as non-executable while not running; flip back on timer

- Timing Attacks / Sleep Skipping Detection
Detect sandbox time acceleration; behave benignly when detected

- PPID Spoofing
Fake the parent process ID of a spawned process

- Misleading Disassembly
Insert junk bytes or overlapping instructions to fool disassemblers

- Hardware Breakpoint Detection
Scan Dr0-Dr7 registers to detect hardware breakpoints

- AMSI Bypass
Patch or tamper with the Antimalware Scan Interface to blind
script-based detection

================================================================
SECTION D — PRIVILEGE & CREDENTIALS
================================================================

----------------------------------------------------------------
D1. CREDENTIAL & PRIVILEGE TECHNIQUES
----------------------------------------------------------------
- Token Impersonation
Steal/duplicate another process's access token

- Pass-the-Hash
Authenticate using an NTLM hash without the plaintext password

- LSASS Dumping
Extract credential material from LSASS process memory

- DPAPI Abuse
Decrypt Chrome cookies, WiFi passwords, Windows credentials via
CryptProtectData / CryptUnprotectData

- Kerberoasting
Request TGS tickets for SPNs and crack service account passwords offline

- Golden Ticket
Forge a Kerberos TGT using the KRBTGT hash — full domain access

- Silver Ticket
Forge a TGS for a specific service without touching the DC

- Shadow Credentials
Add key credentials to an AD object as a stealthy backdoor

- Skeleton Key
Patch LSASS to accept a universal master password

- UAC Bypass
Escalate to high-integrity without a UAC prompt

- ACL Abuse
Exploit weak permissions on registry keys, services, or files

================================================================
SECTION E — ACTIVE DIRECTORY TRADECRAFT
================================================================

----------------------------------------------------------------
E1. AD ATTACKS & ABUSE
----------------------------------------------------------------
- DCSync
Impersonate a DC to request password hashes via MS-DRSR replication protocol

- DCShadow
Register a rogue DC temporarily to push malicious AD changes

- BloodHound Graph Abuse
Use BloodHound-collected AD relationship data to find attack paths
to Domain Admin

- Constrained Delegation Abuse
Abuse services allowed to delegate to specific targets to impersonate users

- Resource-Based Constrained Delegation (RBCD)
Write msDS-AllowedToActOnBehalfOfOtherIdentity to gain impersonation rights

- NTLM Relay
Capture and relay NTLM authentication to authenticate to other services

- PetitPotam
Coerce a DC to authenticate to an attacker via MS-EFSRPC — feeds NTLM relay

- PrinterBug (SpoolSample)
Abuse the Print Spooler to coerce DC authentication

- Zerologon (CVE-2020-1472)
Cryptographic flaw in Netlogon — set DC machine account password to empty

- AdminSDHolder Abuse
Modify AdminSDHolder ACL to propagate permissions to protected groups

- SID History Abuse
Add high-priv SID to a user's SID history as a stealthy backdoor

- Kerberos Delegation (Unconstrained)
Machines with unconstrained delegation store TGTs — coerce DC auth to steal it

================================================================
SECTION F — DEFENSIVE INTERNALS & EDR CONCEPTS
================================================================

----------------------------------------------------------------
F1. EDR / DETECTION ENGINEERING INTERNALS
----------------------------------------------------------------
- AMSI (Antimalware Scan Interface)
Windows API that allows AV/EDR to inspect script content
(PowerShell, VBScript, JScript) before execution

- ETW (Event Tracing for Windows) Providers & Consumers
Kernel and user-mode components emit structured events;
EDRs subscribe to security-relevant providers for telemetry

- ETWTI (ETW Threat Intelligence)
ETW provider specifically for kernel-level process/thread telemetry
used by modern EDRs; harder to blind than user-mode hooks

- Sysmon Internals
Sysinternals tool using kernel callbacks and ETW to log process
creation, network, registry, file, and driver events

- Userland vs Kernel Telemetry
Userland (IAT/inline hooks on ntdll) vs kernel (callbacks, ETW, minifilters)
— kernel telemetry is far harder to evade

- Minifilter Drivers
Kernel drivers that attach to the filter manager to intercept file I/O;
used by AV/EDR to scan files on access

- Kernel Callbacks
PsSetCreateProcessNotifyRoutine, PsSetLoadImageNotifyRoutine,
CmRegisterCallback — EDRs use these for visibility; malware tries to remove them

- CFG (Control Flow Guard)
Compiler+OS mitigation: validates indirect call targets against a bitmap
of valid function entry points

- CET / Hardware Shadow Stack
Intel CET pushes return addresses to a separate shadow stack protected
by hardware; defeats ROP chains that corrupt the normal stack

- PatchGuard (KPP)
Kernel Patch Protection: periodically checks integrity of SSDT, IDT,
GDT, and other kernel structures; BSODs on tampering

- HVCI / VBS (Hypervisor-Protected Code Integrity / Virtualization Based Security)
Uses a hypervisor to isolate the kernel credential store and enforce
code integrity — makes unsigned kernel code execution nearly impossible

- Protected Process Light (PPL)
Restricts which processes can open handles to sensitive processes
(like LSASS) with certain access rights

- LSASS Protection
RunAsPPL registry key makes LSASS a protected process;
requires a signed driver to dump it

================================================================
SECTION G — REVERSE ENGINEERING
================================================================

----------------------------------------------------------------
G1. REVERSE ENGINEERING SKILLS
----------------------------------------------------------------
- Static Analysis
Reading disassembly without running it (IDA Pro, Ghidra, Binary Ninja)

- Dynamic Analysis
Running under a debugger (x64dbg, WinDbg)

- Anti-Debug Tricks
IsDebuggerPresent, NtQueryInformationProcess, timing checks, TLS callbacks

- Hardware Breakpoint Detection
Detect debuggers via debug register inspection (Dr0-Dr7)

- Unpacking
Extracting real payload from a packed/compressed executable

- Deobfuscation
Recovering readable code from obfuscated or encrypted samples

- Binary Patching
Modifying compiled binaries to change behavior

- Binary Diffing
Comparing two binary versions to find changes (Diaphora, BinDiff)
— essential for patch analysis and 1-day research

- Emulation / Unicorn Engine
Run shellcode in an emulated CPU without a full OS environment

- Taint Tracking / Symbolic Execution
Track attacker-controlled data flow through a binary (Angr, Triton)

- Debugger Scripting
Automate analysis with IDAPython, x64dbg's Python API, WinDbg JS

================================================================
SECTION H — LINUX & CROSS-PLATFORM
================================================================

----------------------------------------------------------------
H1. LINUX TECHNIQUES
----------------------------------------------------------------
- ptrace Injection
Linux syscall for process inspection/control; abuse for code injection

- LD_PRELOAD Hijacking
Force a process to load your shared library before system libraries;
override functions like read(), write(), getuid()

- GOT / PLT Hooking
Overwrite Global Offset Table to redirect function calls in ELF binaries

- ELF Internals
ELF header, program headers, section headers, dynamic segment,
symbol tables — Linux equivalent of PE format knowledge

- /proc Manipulation
/proc/[pid]/mem for reading/writing process memory;
/proc/[pid]/maps for layout; used in Linux injection techniques

- eBPF Rootkits
Extended Berkeley Packet Filter programs run in kernel context;
can hook syscalls and hide processes/network connections

- Linux Capabilities Abuse
Fine-grained privilege system (CAP_SYS_ADMIN, CAP_NET_RAW, etc.)
— misconfigurations lead to container escapes and privilege escalation

- cron / systemd Persistence
Classic persistence via crontab entries or malicious systemd units

================================================================
SECTION I — PERSISTENCE MECHANISMS
================================================================

- Registry Run Keys
HKCU\Software\Microsoft\Windows\CurrentVersion\Run

- Scheduled Tasks
Via COM or XML; survive reboots

- COM Hijacking
Replace a legitimate COM object with your own DLL

- DLL Proxying / DLL Side-Loading
Malicious DLL named to match what a legit app expects; forward real exports

- WMI Subscriptions
Trigger payloads on system events

- Boot/Login Scripts via GPO
Scripts in SYSVOL executed at boot/login

- SID History Abuse
Add high-priv SID to user's history as a stealthy backdoor

- SIH Abuse
Abuse Windows maintenance scheduled tasks

- Boot/Pre-OS (Bootkit)
MBR/VBR level persistence

================================================================
SECTION J — FIRMWARE & HARDWARE
================================================================

- UEFI Bootkit
Persist in SPI flash firmware (LoJax, CosmicStrand) — survives reinstalls

- SMM (System Management Mode) Rootkit
Executes in SMRAM, invisible to OS; triggered by SMIs

- PCIe DMA Attacks
Read/write host memory via PCIe/Thunderbolt without CPU (PCILeech)

- ACPI Table Tampering
Embed malicious code in custom ACPI methods

================================================================
SECTION K — HYPERVISOR & VM CONCEPTS
================================================================

- VM Exits
Conditions that cause a guest VM to trap back to the hypervisor (VMM);
hypervisors monitor sensitive instructions via VM exits

- EPT Hooking (Extended Page Tables)
Hook guest physical memory mappings at the hypervisor level —
invisible to the guest OS; used in stealth monitors and rootkits

- Blue Pill Rootkit Concept
Transparently insert a hypervisor under a running OS; OS is unaware
it's now a VM guest

- Hypervisor Introspection (VMI)
Inspect guest VM memory and state from the hypervisor without
touching the guest — powerful for transparent monitoring

- Intel VT-x Internals
VMX root/non-root operation, VMCS fields, VMLAUNCH/VMRESUME,
EPT, VPID — foundational for building a hypervisor

- CPUID Fingerprinting
Detect virtualization via CPUID hypervisor bit and vendor strings

- Timing-Based VM Detection
RDTSC delta differences between bare metal and VM environments

- SGX Enclaves
Intel Software Guard Extensions — isolated encrypted memory regions
even the OS/hypervisor can't read; used for secrets and anti-analysis

- TPM Abuse Concepts
Trusted Platform Module sealing/unsealing secrets tied to platform state;
research into PCR manipulation and TPM-based malware resilience

================================================================
SECTION L — NETWORK, C2 & TRAFFIC EVASION
================================================================

- C2 Protocol Mimicry
Disguise traffic as: HTTPS, DNS, MS Graph API, Telegram, Slack, OneDrive

- JA3 / JA3S Fingerprinting
Fingerprint TLS clients/servers from handshake parameters;
EDRs/NDRs use this to identify C2 tools

- JARM Fingerprint Spoofing
Manipulate active TLS fingerprint to avoid C2 server identification

- HTTP/2 C2
Use HTTP/2 multiplexing to blend C2 traffic into normal web traffic

- QUIC-Based Transport
UDP-based protocol; harder to inspect than TCP/TLS streams

- Domain Fronting
Route C2 through a CDN; largely mitigated, replaced by CDN impersonation

- Dead Drop Resolvers
Store C2 address in a public service (Twitter, Pastebin, GitHub)
so the real C2 IP never appears in the binary

- DGA (Domain Generation Algorithms)
Algorithmically generate hundreds of domain names; only the attacker
knows which one is registered today

- Fast Flux DNS
Rapidly rotate IPs behind a C2 domain to evade IP blocklists

- Peer-to-Peer Botnets
Decentralized C2 with no single point of failure; nodes relay commands

- Traffic Shaping
Throttle and time C2 beacons to mimic normal user browser traffic

- Covert Channels
Hide data in protocol fields not meant for data (DNS TXT, ICMP payload,
HTTP headers, image steganography)

- C2 Over WebSocket / gRPC
Modern protocol channels that blend naturally into enterprise traffic

- Living Off the Land (LOLBins)
Use built-in Windows binaries to avoid dropping files:
mshta, regsvr32, cscript, wmic, certutil, rundll32, msiexec, bitsadmin

================================================================
SECTION M — ADVANCED RESEARCH TOPICS
================================================================

- DKOM (Direct Kernel Object Manipulation)
Directly modify kernel structures (e.g., unlink a process from
ActiveProcessLinks to hide it from task managers)

- Object Callbacks
ObRegisterCallbacks — kernel mechanism for object open/duplicate
notification; abused by anti-cheat and rootkits alike

- Heaven's Gate Variants
Beyond 32->64 mode switch: variants for syscall table switching
and wow64 layer abuse

- Gargoyle Memory Hiding
Execute shellcode, then mark it non-executable and hide it in heap;
re-arm via timer to re-execute later

- Sleep Obfuscation Techniques
Encrypt implant in memory during sleep: Ekko, Foliage, Cronos variants

- Stack Encryption
XOR or AES the stack during wait periods to evade memory scanning

- Return Address Spoofing
Overwrite return addresses on the stack to fake call origin

- Intel VT-x Internals
VMCS, EPT, VM exits — foundation for building custom hypervisors

- Kernel Patch Protection (PatchGuard) Internals
How PatchGuard works: encrypted timer callbacks, integrity checks,
randomized scheduling — and why bypassing it is extremely difficult

- ETWTI (ETW Threat Intelligence Provider)
Kernel ETW provider emitting thread/process events used by modern EDRs;
patching it requires kernel access and triggers PatchGuard

================================================================
SECTION N — LEARNING RESOURCES
================================================================

Courses:
- OSCP (Offensive Security Certified Professional)
- OSED (Offensive Security Exploit Developer)
- CRTO (Certified Red Team Operator)
- CRTE (Certified Red Team Expert — AD focused)
- Sektor7 Malware Development (intro + intermediate + rootkits)
- SANS FOR610 (Reverse Engineering Malware)
- SANS SEC760 (Advanced Exploit Development)
- TCM Security Malware Analysis Courses

Books:
- The Shellcoder's Handbook
- Practical Malware Analysis (Sikorski & Honig)
- Windows Internals Parts 1 & 2 (Russinovich et al.)
- The Art of Memory Forensics
- Rootkits: Subverting the Windows Kernel
- Hacking: The Art of Exploitation (Erickson)
- The Web Application Hacker's Handbook

Disassemblers / Decompilers:
- IDA Pro (industry standard)
- Ghidra (free, NSA open-source)
- Binary Ninja (scriptable, modern UI)
- Cutter / Rizin (free open-source)

Debuggers:
- x64dbg (Windows user-mode)
- WinDbg / WinDbg Preview (kernel + user-mode)
- GDB + pwndbg/peda (Linux)

Dynamic Instrumentation:
- Frida (scriptable, cross-platform)
- DynamoRIO (binary translation framework)
- PIN (Intel) (x86 instrumentation)

System Inspection:
- Process Hacker / System Informer
- Process Monitor (ProcMon)
- API Monitor

Network Analysis:
- Wireshark
- Zeek / Bro
- Fakenet-NG (dynamic network analysis for malware)

Emulation / Symbolic Execution:
- Unicorn Engine (CPU emulation)
- Angr (symbolic execution)
- Triton (dynamic taint + symbolic)

Hardware / DMA:
- PCILeech / MemProcFS

Practice Environments:
- TryHackMe
- HackTheBox
- VulnHub
- Any.run (online sandbox)
- MalwareBazaar (real samples)
- Flare-VM (Windows RE environment)
- REMnux (Linux RE environment)
- pwn.college (exploit development)

================================================================
NOTE: These concepts are for educational purposes —
malware analysis, red teaming, CTFs, and security research.
Always operate within legal boundaries and in authorized
environments (your own lab, CTFs, bug bounty programs).
================================================================

Monday, June 1, 2026

Reverse Engineering: Understanding the Thoughts Behind Systems

A software program or hardware system is usually the result of people organizing ideas, logic, constraints, and decisions to solve a problem. The final product becomes a kind of “frozen thinking” expressed through:

code
circuit layouts
protocols
algorithms
mechanical structures
data formats
timing behavior
UI decisions
optimization tricks

So reverse engineering is often the process of working backward from the finished system to understand:

what problem the creators were solving
how the system works internally
why certain design decisions were made
what assumptions or constraints existed
how components interact

In software, that may involve:

studying binaries
analyzing assembly
tracing execution
reconstructing algorithms
understanding data structures

In hardware, it may involve:

tracing PCB connections
identifying chips
analyzing signals
reconstructing schematics
understanding timing and electrical behavior

So in a philosophical sense, reverse engineering can feel like “reading the engineers’ thought process” indirectly through the artifact they created.

But it’s important to understand a distinction:

You are not literally reading their thoughts — you are inferring them from evidence left behind in the design.

Sometimes those inferences are accurate.
Sometimes multiple different thought processes could produce the same result.

For example:

an unusual algorithm might reveal a performance optimization mindset
extra security checks may reveal concern about tampering
elegant modular design may show emphasis on maintainability
messy duplicated logic may show deadline pressure or rapid iteration

Experienced reverse engineers often become good at recognizing “engineering fingerprints”:

compiler patterns
coding styles
architectural habits
optimization strategies
hardware design conventions

In that sense, reverse engineering is partly technical analysis and partly reasoning about human design decisions.

Software Architectures for Arduino and Embedded Systems

1 - Monolithic Architecture

Monolithic architecture is the simplest and most common approach used in small Arduino projects. In this design, nearly all functionality is placed directly inside the main Arduino sketch using the setup() and loop() functions. Sensor reading, display handling, communication, and control logic are all processed sequentially inside a single program structure.

This architecture is easy to understand and requires minimal memory, making it suitable for beginners and small microcontrollers such as the Arduino Uno and Nano. However, as the project grows larger, the code can become difficult to maintain because all system components are tightly connected.

2 - Modular Architecture

Modular architecture divides the firmware into separate modules or source files, where each module handles a specific responsibility such as sensor management, display control, communication, or storage.

This approach improves code organization, readability, debugging, and reusability. Developers can modify one module without affecting the rest of the system significantly. Modular architecture is widely used in medium-sized Arduino and embedded projects because it provides better scalability compared to monolithic designs.

3 - Layered Architecture

Layered architecture organizes firmware into multiple logical layers. Common layers include application logic, middleware or services, hardware abstraction, drivers, and direct hardware interaction.

Each layer communicates with the layer directly below or above it. This structure improves portability and maintainability because hardware-specific code is separated from application logic. Layered architecture is common in professional embedded systems and advanced microcontroller frameworks.

4 - Event-Driven Architecture

Event-driven architecture is based on reacting to events instead of continuously checking every subsystem in sequence. Events may include button presses, timer expirations, sensor triggers, serial communication, or network messages.

When an event occurs, the firmware executes a corresponding handler function. This architecture improves responsiveness and is commonly used in menu systems, IoT devices, robotics, and automation systems.

5 - State Machine Architecture

State machine architecture organizes firmware behavior into defined states such as idle, running, paused, error, or sleep. The system transitions between these states depending on conditions or events.

This architecture provides predictable system behavior and simplifies debugging. State machines are widely used in robotics, automation controllers, industrial systems, and embedded devices that require clear operational flow.

6 - Finite State Machine (FSM)

A finite state machine is a formal implementation of a state machine where transitions between states are explicitly defined.

FSMs are commonly used in communication protocols, menu systems, LED animation controllers, and sequential process control because they provide clear and structured logic flow.

7 - Cooperative Multitasking

Cooperative multitasking simulates multitasking without using an operating system. The firmware is divided into multiple short tasks that execute repeatedly inside the main loop.

Each task must return quickly so other tasks can execute without delays. Timing is commonly handled using millis() instead of blocking functions such as delay(). This architecture is extremely popular in Arduino development.

8 - Scheduler-Based Architecture

Scheduler-based architecture uses a scheduler to determine when tasks should run. Tasks may execute periodically at fixed intervals such as every few milliseconds or seconds.

This approach simplifies timing management and improves organization in projects containing multiple timed operations. Scheduler libraries are commonly used in automation and sensor-based systems.

9 - RTOS Architecture

RTOS architecture uses a real-time operating system such as FreeRTOS to manage multitasking. Tasks run independently and may use priorities, queues, semaphores, and synchronization mechanisms.

This architecture enables true multitasking and is commonly used on advanced microcontrollers such as ESP32, STM32, and RP2040. RTOS systems are suitable for complex real-time applications but require more memory and system resources.

10 - Actor Architecture

Actor architecture divides the system into independent software actors that communicate through messages instead of shared variables.

Each actor processes information independently, improving modularity and concurrency handling. This architecture is more common in advanced embedded systems and multicore microcontrollers.

11 - Service-Oriented Architecture

Service-oriented architecture divides firmware into services such as networking, storage, display management, sensor processing, or LED control.

Each service provides specific functionality through defined APIs. This architecture improves separation of concerns and is commonly used in IoT firmware and smart device systems.

12 - Plugin Architecture

Plugin architecture allows features or modules to be added or removed independently. In Arduino systems, plugins are usually compile-time modules because smaller microcontrollers typically cannot load binary modules dynamically.

This architecture is common in configurable firmware such as LED effect systems and home automation controllers.

13 - Component-Based Architecture

Component-based architecture builds the system using reusable software components. Each component encapsulates its own functionality and interfaces.

This approach improves reusability and maintainability and is commonly used in robotics frameworks, GUI systems, and large embedded applications.

14 - Dataflow Architecture

Dataflow architecture organizes processing as a flow of data through multiple stages such as acquisition, filtering, transformation, and output.

This architecture is useful in digital signal processing, sensor fusion, audio processing, and data streaming systems because it clearly represents how information moves through the firmware.

15 - Interrupt-Driven Architecture

Interrupt-driven architecture uses hardware or software interrupts to respond immediately to important events such as timer overflows, UART communication, encoder pulses, or GPIO changes.

Interrupts improve responsiveness and timing precision. However, interrupt handlers must remain short and efficient to avoid system instability.

16 - Reactive Architecture

Reactive architecture continuously reacts to changing system conditions. Examples include responding to sensor thresholds, battery voltage changes, or communication events.

This architecture is widely used in automation systems, smart sensors, and adaptive embedded devices.

17 - Command Architecture

Command architecture processes commands received from serial communication, EEPROM, SD cards, filesystems, or network interfaces.

Commands may control LEDs, animations, settings, or device operations. This approach is useful in configurable firmware, scripting systems, and automation controllers.

18 - Pipeline Architecture

Pipeline architecture divides operations into sequential stages where data flows from one stage to another.

For example, a system may read data, decode it, process it, and display it in separate stages. This architecture is useful for streaming systems, binary processing, and LED animation engines.

19 - MVC Architecture

MVC stands for Model-View-Controller. This architecture separates application data, visual representation, and user interaction into different sections.

Although less common in small embedded systems, MVC is useful in touchscreen interfaces, menu-driven systems, and graphical user interfaces.

20 - Hardware Abstraction Layer (HAL)

A hardware abstraction layer provides generic interfaces for hardware operations while hiding low-level hardware details.

Instead of directly controlling registers or GPIO pins throughout the firmware, the application uses abstracted hardware functions. HAL improves portability and simplifies migration between different microcontroller platforms such as AVR, ESP32, STM32, and RP2040.

Saturday, May 16, 2026

ATmega328P Bootloader

A Step-by-Step Tutorial

Writing an Optiboot-style UART Bootloader from Scratch

For Windows | Arduino IDE 1.8.x | ATmega328P @ 16MHz

Chapter 1: How the ATmega328P Boots

1.1 — What is the ATmega328P at its Core?

The ATmega328P is an 8-bit microcontroller made by Microchip (formerly Atmel). It contains three separate memory types:

Memory	Size	Purpose	Survives Power Off?
Flash	32KB	Stores your program code	Yes
SRAM	2KB	Stores variables while running	No
EEPROM	1KB	Stores small persistent data	Yes

We only care about Flash for our bootloader. SRAM and EEPROM are not involved in booting.

📖 Datasheet §1 — "The ATmega328P is a low-power CMOS 8-bit microcontroller based on the AVR enhanced RISC architecture"

1.2 — The Program Counter (PC)

The CPU has a special internal register called the Program Counter (PC). It holds the address of the next instruction to execute. The CPU is a machine that does this millions of times per second:

loop forever:

1. Read instruction at address stored in PC

2. Execute that instruction

3. Increment PC (or jump if instruction says so)

📖 Datasheet §6.3 — "The Program Counter (PC) is 14 bits wide, addressing the 16K word (32KB) program memory space"

⚠️ Note: PC counts in WORDS (2 bytes each). 2^14 = 16,384 words = 32,768 bytes = 32KB. You will see both word and byte addresses in the datasheet.

1.3 — The Flash Memory Map in Detail

The full 32KB Flash with real addresses:

BYTE ADDRESS CONTENT

┌────────────────────────────────────────────┐

│ 0x0000 ← RESET VECTOR │

│ 0x0002 ← INT0 vector │

│ 0x0004 ← INT1 vector │

│ ... other interrupt vectors │

├────────────────────────────────────────────┤

│ │

│ APPLICATION SECTION │

│ (your sketch / program) │

│ │

├────────────────────────────────────────────┤

│ 0x7C00 ← BOOT START (BOOTSZ=512 words) │

│ BOOTLOADER SECTION │

│ (our code lives here) │

│ 0x7FFF ← TOP OF FLASH │

└────────────────────────────────────────────┘

Total: 32,768 bytes (32KB)

📖 Datasheet §27.5, Table 27-5 — Defines boot section sizes and start addresses

1.4 — What Happens in the First Nanoseconds After Reset?

Things that cause a Reset: Power on (VCC rises), RESET pin pulled low, Watchdog Timer timeout, Brown-out detection, Software reset via watchdog.

📖 Datasheet §8.1 — "The ATmega328P provides several reset sources"

Regardless of reset source, the sequence is always:

Reset occurs

│

▼

CPU initializes internally (registers cleared)

│

▼

┌─────────────────────────────────────────────┐

│ CPU checks: Is BOOTRST fuse programmed? │

└─────────────────────────────────────────────┘

│ │

NO YES

│ │

▼ ▼

PC = 0x0000 PC = 0x7C00

Your app runs Bootloader runs

📖 Datasheet §27.4 — "If the BOOTRST Fuse is programmed, the reset vector is pointing to the Boot Flash start address"

1.5 — The BOOTRST Fuse Bit

Fuse bits are configuration bits stored outside Flash in dedicated hardware. They survive power cycling and are written with a programmer like USBasp.

⚠️ AVR Fuse Convention: Bit = 1 means UNPROGRAMMED (not active, factory default). Bit = 0 means PROGRAMMED (active). So BOOTRST = 0 means bootloader IS active. This inverted logic confuses everyone at first!

1.6 — The Two Upload Scenarios

Scenario A: USBasp Without a Bootloader

The USBasp talks SPI directly to the chip hardware (ICSP header). It writes raw bytes to Flash starting at 0x0000. No bootloader is needed or involved. Even a completely blank chip can be programmed this way. BOOTRST fuse is NOT set, so CPU starts at 0x0000 and your code runs immediately.

Scenario B: Arduino IDE Upload (With Bootloader)

The Arduino IDE runs Avrdude which talks to the bootloader running on the chip over UART. A DTR pin pulse triggers a hardware reset, the chip resets with BOOTRST set so PC goes to 0x7C00, the bootloader starts, waits for Avrdude, receives the sketch over UART, writes it to Flash at 0x0000, then jumps to the application.

	USBasp (no bootloader)	Arduino IDE (with bootloader)
How it writes Flash	SPI/ICSP hardware	UART + bootloader software
Reset goes to	0x0000 always	0x7C00 (bootloader) first
Bootloader needed?	No	Yes
Can overwrite bootloader?	Yes (dangerous)	No (lock bits protect it)

1.7 — Every Startup, Without Exception

Once a bootloader is installed and BOOTRST is set, every single startup follows this flow:

POWER ON or RESET

│

▼

PC = 0x7C00 (BOOTRST forces this)

│

▼

BOOTLOADER RUNS

1. Initialize UART

2. Wait ~1 second for Avrdude

│

┌────┴────┐

│ │

Avrdude No response (timeout)

connects │

│ Jump to 0x0000

│ │

Receive YOUR SKETCH RUNS

& flash

sketch

│

Reset → bootloader → timeout → 0x0000

1.8 — Chapter 1 Summary

Concept	Key Fact
Program Counter	Holds address of next instruction, starts at 0x0000 or 0x7C00
Flash layout	Application at bottom (0x0000), bootloader at top (0x7C00)
BOOTRST fuse	0 = programmed = CPU starts at boot section on reset
AVR fuse logic	0 = active, 1 = inactive (inverted — always remember this!)
USBasp upload	SPI directly to hardware, no bootloader needed, writes from 0x0000
Arduino upload	UART to bootloader, bootloader writes sketch to 0x0000
Every startup	If BOOTRST set → bootloader always runs first, then jumps to app
Bootloader's job	Check for new upload → if none, hand control to application

Chapter 2: Bootloader Section & Fuses

We only care about 3 things from the fuse system:

1. BOOTRST → Tell CPU to start at bootloader on reset

2. BOOTSZ → Tell CPU how big our bootloader is (sets start address)

3. Lock bits → Protect bootloader from being overwritten

2.1 — BOOTSZ: Choosing Our Bootloader Size

📖 Datasheet §27.5, Table 27-5 — Boot section sizes and start addresses

BOOTSZ1	BOOTSZ0	Size (words)	Size (bytes)	Start Address (byte)
1	1	256 words	512 bytes	0x7E00
1	0	512 words	1024 bytes	0x7C00 ← we use this
0	1	1024 words	2048 bytes	0x7800
0	0	2048 words	4096 bytes	0x7000

We pick 512 words (1024 bytes) starting at 0x7C00 — same as Optiboot. 256 bytes is too small, 512 bytes is the sweet spot, 1024+ wastes application space.

2.2 — The Fuse Bytes (Only What We Touch)

The ATmega328P has 3 fuse bytes. We only touch the High Fuse Byte (HFUSE). Our HFUSE value is 0xDA — this sets BOOTRST=0 (active) and BOOTSZ=10 (512 words).

⚠️ Critical: SPIEN (bit 5) must stay 0 (programmed/active). It enables SPI programming. If you accidentally set it to 1, you can no longer program the chip with USBasp!

📖 Datasheet §27.4, Table 27-3 — High Fuse Byte bit description

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

2.3 — Lock Bits: Protecting Our Bootloader

📖 Datasheet §27.6, Table 27-7 — Boot Lock Bit table

BLB12	BLB11	Effect
1	1	No restrictions (default — dangerous!)
1	0	Application cannot WRITE to boot section ← we use this
0	1	Application cannot READ from boot section
0	0	Application cannot READ or WRITE boot section

Our Lock byte value is 0xEF. Set lock bits LAST — after the bootloader is flashed and working. Lock bits can only be cleared by a full chip erase which would wipe your bootloader.

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

2.4 — The Complete Fuse Setup

# 1. Set fuses (BOOTRST active, BOOTSZ = 512 words)

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

# 2. Flash our bootloader binary

avrdude -c usbasp -p m328p -U flash:w:bootloader.hex

# 3. Set lock bits LAST (protect bootloader section)

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

2.5 — Chapter 2 Summary

Thing	Value	Why
Bootloader size	512 words / 1024 bytes	Small enough, fits our code
Bootloader start	0x7C00	Calculated from BOOTSZ
HFUSE	0xDA	BOOTRST=0, BOOTSZ=10
Lock byte	0xEF	App cannot overwrite bootloader
Set fuses	First	With USBasp before anything
Set lock bits	Last	After bootloader is flashed and verified

Chapter 3: Toolchain Setup (Windows)

3.1 — Tools Already Installed via Arduino IDE

Since Arduino IDE 1.8.x is installed, you already have everything needed. No downloads required.

C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\

tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin\

avr-gcc.exe ← the compiler

avr-objcopy.exe ← converts compiled output to .hex

avr-size.exe ← shows how big our binary is

⚠️ Note: Replace <username> with your actual Windows username everywhere in the scripts below.

3.2 — Our Project Folder

C:\AVR_Bootloader\

├── src\

│ └── main.c ← our bootloader source code

├── build\ ← compiled output goes here

├── build.bat ← our build script

└── flash.bat ← our flash script

3.3 — The Build Batch File (build.bat)

Create build.bat in C:\AVR_Bootloader\ with this content:

@echo off

REM ATmega328P Bootloader Build Script

set AVR=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin

set SRC=src\main.c

set BUILD=build

set OUT=bootloader

set MCU=atmega328p

set F_CPU=16000000UL

set BOOT_ADDR=0x7C00

echo [1/4] Compiling...

%AVR%\avr-gcc.exe -mmcu=%MCU% -DF_CPU=%F_CPU% -Os -std=c99 ^

-Wl,--section-start=.text=%BOOT_ADDR% ^

-o %BUILD%\%OUT%.elf %SRC%

if errorlevel 1 goto error

echo [2/4] Creating .hex file...

%AVR%\avr-objcopy.exe -O ihex -R .eeprom %BUILD%\%OUT%.elf %BUILD%\%OUT%.hex

if errorlevel 1 goto error

echo [3/4] Checking binary size...

%AVR%\avr-size.exe --format=avr --mcu=%MCU% %BUILD%\%OUT%.elf

echo [4/4] Done!

echo Output: %BUILD%\%OUT%.hex

goto end

:error

echo BUILD FAILED!

:end

pause

3.4 — The Most Important Line Explained

-Wl,--section-start=.text=%BOOT_ADDR%

-Wl, → pass this flag to the linker

--section-start → place this section at this address

.text → where compiled code lives

=%BOOT_ADDR% → = 0x7C00 (our bootloader start address)

3.5 — Other Compiler Flags Explained

Flag	Purpose
-mmcu=atmega328p	Tells compiler exactly which AVR chip — sets correct register addresses
-DF_CPU=16000000UL	Defines F_CPU as 16MHz — used in Chapter 4 for baud rate calculation
-Os	Optimize for SIZE not speed — bootloader must fit in 1024 bytes!
-std=c99	Use C99 standard — allows cleaner code style

3.6 — Flash Script (flash.bat)

@echo off

REM ATmega328P Bootloader Flash Script

set AVRDUDE=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\bin\avrdude.exe

set AVRDUDE_CONF=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\etc\avrdude.conf

set HEX=build\bootloader.hex

set MCU=atmega328p

set PROGRAMMER=usbasp

echo [1/3] Setting fuses...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U hfuse:w:0xDA:m

if errorlevel 1 goto error

echo [2/3] Flashing bootloader...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U flash:w:%HEX%:i

if errorlevel 1 goto error

echo [3/3] Setting lock bits...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U lock:w:0xEF:m

if errorlevel 1 goto error

echo All done! Bootloader installed successfully.

goto end

:error

echo FAILED! Check USBasp connection.

:end

pause

3.7 — Chapter 3 Summary

File	Purpose
build.bat	Compiles src\main.c → build\bootloader.hex
flash.bat	Sets fuses, flashes hex, sets lock bits via USBasp

Chapter 4: UART From Scratch

4.1 — What is UART?

UART stands for Universal Asynchronous Receiver Transmitter. It is the simplest way two devices can talk — just two wires: TX (transmit) and RX (receive). Asynchronous means there is no shared clock wire. Both sides must agree on the speed beforehand — called the Baud Rate, measured in bits per second.

📖 Datasheet §19.1 — "The Universal Synchronous and Asynchronous serial Receiver and Transmitter (USART) is a highly flexible serial communication device"

4.2 — How UART Sends a Byte

When idle, the TX line sits HIGH. To send one byte (8 bits):

Frame = 1 start bit + 8 data bits + 1 stop bit = 10 bits total

Idle Start D0 D1 D2 D3 D4 D5 D6 D7 Stop Idle

────┐ ┌──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌────────

│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │

└──┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘

At 115200 baud:

1 bit = 1 / 115200 = 8.68 microseconds

1 byte = 8.68 × 10 = 86.8 microseconds

4.3 — The 3 Registers We Need

Register	Purpose
UBRR0	Sets the baud rate
UCSR0B	Enables transmitter and receiver
UCSR0C	Sets frame format (8 data bits, 1 stop bit)
UDR0	Write here to send, Read here to receive
UCSR0A	Status flags — is TX done? is RX ready?

4.4 — UBRR0: Setting the Baud Rate

📖 Datasheet §19.10 — USART Baud Rate Registers

The formula to calculate UBRR:

F_CPU

UBRR = ────────── - 1

16 × BAUD

For 16MHz CPU, 115200 baud:

16,000,000

UBRR = ──────────── - 1 = 8.68 - 1 = 7.68 → 8 (rounded)

16 × 115200

📖 Datasheet §19.11, Table 19-9 — Confirms UBRR=8 for 16MHz / 115200 baud. Actual baud rate error is 3.5% which Optiboot also uses and works reliably in practice.

#define F_CPU 16000000UL

#define BAUD 115200

#define UBRR (F_CPU / (16UL * BAUD)) - 1 // = 8

UBRR0H = (UBRR >> 8); // high byte first

UBRR0L = UBRR; // low byte

4.5 — UCSR0B: Enabling TX and RX

📖 Datasheet §19.9.3 — UCSR0B description. We only need RXEN0 (bit 4) to enable receiver and TXEN0 (bit 3) to enable transmitter.

UCSR0B = (1 << RXEN0) | (1 << TXEN0);

4.6 — UCSR0C: Frame Format

We want 8N1 — 8 data bits, No parity, 1 stop bit. This is standard for all AVR bootloaders.

📖 Datasheet §19.9.4 — UCSR0C description. UCSZ01:00 = 11 sets 8 data bits. UPM = 00 = no parity. USBS = 0 = 1 stop bit.

UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

4.7 — Putting It Together: uart_init()

void uart_init(void)

{

UBRR0H = 0; // high byte of 8

UBRR0L = 8; // low byte of 8

UCSR0B = (1 << RXEN0) | (1 << TXEN0);

UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

4.8 — UCSR0A: Status Flags

📖 Datasheet §19.9.1 — UCSR0A description. UDRE0: TX buffer empty — safe to send next byte. RXC0: New byte waiting to be read.

4.9 — uart_send() and uart_receive()

void uart_send(uint8_t byte)

{

while (!(UCSR0A & (1 << UDRE0))); // wait until TX buffer empty

UDR0 = byte; // hardware sends it automatically

}

uint8_t uart_receive(void)

{

while (!(UCSR0A & (1 << RXC0))); // wait until byte arrives

return UDR0; // read and return byte

}

4.10 — Chapter 4 Summary

Register	What We Set	Why
UBRR0H:L	8	115200 baud at 16MHz
UCSR0B	RXEN0 \| TXEN0	Enable RX and TX hardware
UCSR0C	UCSZ01 \| UCSZ00	8N1 frame format
UCSR0A	Read only	Check UDRE0 before send, RXC0 before receive
UDR0	Write to send, Read to receive	The actual data register

Chapter 5: The STK500v1 Protocol

5.1 — What is STK500v1?

When you hit Upload in Arduino IDE, Avrdude runs and starts sending bytes over UART to our bootloader following a protocol called STK500v1. Every exchange follows the same structure:

Every COMMAND Avrdude sends:

┌──────────┬──────────────────────┬──────────┐

│ CMD │ PARAMETERS │ SYNC │

│ 1 byte │ 0 or more bytes │ 0x20 │

└──────────┴──────────────────────┴──────────┘

CRC_EOP

Every RESPONSE we send:

┌──────────┬──────────────────────┬──────────┐

│ 0x14 │ DATA (if any) │ 0x10 │

└──────────┴──────────────────────┴──────────┘

INSYNC OK

5.2 — Protocol Constants

#define STK_OK 0x10

#define STK_INSYNC 0x14

#define STK_NOSYNC 0x15

#define STK_CRC_EOP 0x20

#define STK_GET_SYNC 0x30

#define STK_GET_PARAMETER 0x41

#define STK_SET_DEVICE 0x42

#define STK_SET_DEVICE_EXT 0x45

#define STK_ENTER_PROGMODE 0x50

#define STK_LEAVE_PROGMODE 0x51

#define STK_LOAD_ADDRESS 0x55

#define STK_PROG_PAGE 0x64

#define STK_READ_PAGE 0x74

#define STK_READ_SIGN 0x75

5.3 — Commands We Handle

Command	Parameters	What We Do
GET_SYNC (0x30)	None	Reply INSYNC + OK
GET_PARAMETER (0x41)	1 byte param ID	Reply version number
SET_DEVICE (0x42)	20 bytes	Drain bytes, reply OK
SET_DEVICE_EXT (0x45)	5 bytes	Drain bytes, reply OK
ENTER_PROGMODE (0x50)	None	Reply OK
LOAD_ADDRESS (0x55)	2 byte address	Store address, reply OK
PROG_PAGE (0x64)	Length + data	Write to Flash (Ch6), reply OK
READ_PAGE (0x74)	Length	Send Flash bytes back
READ_SIGN (0x75)	None	Send 0x1E 0x95 0x0F
LEAVE_PROGMODE (0x51)	None	Reply OK, jump to app (Ch7)

5.4 — Key Command Details

STK_LOAD_ADDRESS — Set Write Address

case STK_LOAD_ADDRESS:

{

uint16_t lo = uart_receive();

uint16_t hi = uart_receive();

address = (hi << 8) | lo;

/* address is a WORD address — multiply by 2 for SPM */

get_sync();

uart_send(STK_INSYNC);

uart_send(STK_OK);

break;

}

STK_PROG_PAGE — Write Flash

case STK_PROG_PAGE:

{

uint16_t len = ((uint16_t)uart_receive() << 8) | uart_receive();

uint8_t type = uart_receive(); // 'F' = Flash

for (uint16_t i = 0; i < len; i++)

page_buffer[i] = uart_receive();

get_sync();

if (type == 'F')

write_flash_page(address, page_buffer, len); // Chapter 6

uart_send(STK_INSYNC);

uart_send(STK_OK);

break;

}

STK_READ_SIGN — Chip Signature

📖 Datasheet §27.8.1 — "The ATmega328P has a three byte signature code: 0x1E, 0x95, 0x0F"

case STK_READ_SIGN:

get_sync();

uart_send(STK_INSYNC);

uart_send(0x1E); // Atmel/Microchip

uart_send(0x95); // 32KB Flash

uart_send(0x0F); // ATmega328P

uart_send(STK_OK);

break;

5.5 — The Full Upload Sequence

Avrdude Our Bootloader

│── GET_SYNC (×several) ────────────►│

│◄── INSYNC + OK ───────────────────│

│── GET_PARAMETER (SW major/minor) ─►│

│◄── INSYNC + version + OK ─────────│

│── SET_DEVICE (20 bytes) ──────────►│ (ignored)

│── SET_DEVICE_EXT (5 bytes) ────────►│ (ignored)

│── ENTER_PROGMODE ────────────────►│

│── READ_SIGN ─────────────────────►│

│◄── INSYNC + 1E 95 0F + OK ────────│

│── LOAD_ADDRESS (page 0) ──────────►│ address = 0x0000

│── PROG_PAGE (128 bytes) ──────────►│ write page to flash

│── READ_PAGE (verify) ─────────────►│ read back and send

│ ... repeat for every page ... │

│── LEAVE_PROGMODE ────────────────►│

│◄── INSYNC + OK ───────────────────│

jump to 0x0000

Chapter 6: Flash Self-Programming

6.1 — What is Flash Self-Programming?

Normal code reads from Flash. Our bootloader needs to write to Flash — writing the incoming sketch data into the application section. This is called Self-Programming — the chip modifying its own Flash while running.

📖 Datasheet §27.1 — "The Boot program can use any available data interface and associated protocol to read code and write (program) that code into the Flash memory"

6.2 — The Most Important Constraint: Pages

You cannot write individual bytes to Flash. Flash is organized into fixed blocks called pages. You must write a whole page at a time.

📖 Datasheet §27.5 — "The Flash is organized in pages. When programming the Flash, the program data must be written one page at a time"

Flash page size = 64 WORDS = 128 BYTES

Total pages = 256 pages

256 pages × 128 bytes = 32,768 bytes = 32KB

📖 Datasheet §27.5, Table 27-5 — "Page Size: 64 words / 128 bytes"

6.3 — The 3 Step Writing Process

Step 1 — ERASE the page

Flash bits can only go 1→0 when writing.

Erase resets all bits back to 1 (0xFF).

Must erase before writing new data.

Step 2 — FILL the page buffer

Load your 128 bytes into a temporary hardware

buffer inside the chip (word by word — 2 bytes at a time).

NOT written to Flash yet.

Step 3 — WRITE the page buffer to Flash

Hardware copies buffer into actual Flash page.

Takes ~3.7ms — CPU stalls during this.

⚠️ Critical: If you skip the erase step, bits that were 0 stay 0. Your new data gets corrupted. Always erase first.

📖 Datasheet §27.3 — Page Erase, Fill Temporary Buffer, Write Page from Temporary Buffer

6.4 — The SPM Instruction and boot.h

SPM (Store Program Memory) is a special AVR assembly instruction. We use <avr/boot.h> which wraps it in clean C macros:

#include <avr/boot.h>

boot_page_erase(byte_address); // Erase one page

boot_spm_busy_wait(); // Wait for operation to complete

boot_page_fill(byte_address, w); // Fill one word (2 bytes) into buffer

boot_page_write(byte_address); // Write page buffer to Flash

boot_rww_enable(); // Re-enable app section reading

6.5 — RWW vs NRWW Sections

📖 Datasheet §27.1 — "The Flash memory is organized in two sections: Read-While-Write (RWW) and No Read-While-Write (NRWW)"

The application section (0x0000-0x7BFF) is RWW — while writing here, our bootloader code can still execute. The bootloader section (0x7C00-0x7FFF) is NRWW. After writing any RWW page, we must call boot_rww_enable() to re-enable reading from the application section.

6.6 — write_flash_page(): The Full Function

#define PAGE_SIZE 128

void write_flash_page(uint16_t word_addr,

uint8_t *data,

uint16_t length)

{

/* Safety — never write into bootloader section */

if (word_addr >= (BOOT_START / 2))

return;

/* Convert word address to byte address for SPM */

uint32_t byte_addr = (uint32_t)word_addr * 2;

/* Step 1 — Erase the page (~3.7ms) */

boot_page_erase(byte_addr);

boot_spm_busy_wait();

/* Step 2 — Fill page buffer word by word */

for (uint16_t i = 0; i < length; i += 2)

{

uint16_t word = data[i] | ((uint16_t)data[i + 1] << 8);

boot_page_fill(byte_addr + i, word);

}

/* Step 3 — Write page buffer to Flash (~3.7ms) */

boot_page_write(byte_addr);

boot_spm_busy_wait();

/* Re-enable RWW section for reading */

boot_rww_enable();

}

6.7 — Timing

📖 Datasheet §27.8.1, Table 27-14 — Page Erase: 3.7ms. Page Write: 3.7ms. Total per page: ~7.4ms. Worst case full 32KB sketch: 256 pages × 7.4ms ≈ 1.9 seconds.

6.8 — Chapter 6 Summary

Concept	Key Point
Page size	128 bytes — must write whole pages
3 steps	Erase → Fill buffer → Write
SPM	Hardware instruction for Flash writing
<avr/boot.h>	Clean C macros wrapping SPM
Timing	~7.4ms per page (erase + write)
Word vs byte	Multiply word address × 2 for SPM
RWW	Must call boot_rww_enable() after every write
Guard check	Never write above 0x7C00

Chapter 7: Jumping to the App & Watchdog Timer

7.1 — What is the Watchdog Timer?

The Watchdog Timer (WDT) is a completely independent hardware timer built into the ATmega328P. It runs on its own internal oscillator — separate from your main clock. Think of it like a dead man's switch: your code must periodically kick the watchdog to prove it is still alive. If it does not kick it in time — the watchdog resets the CPU. No exceptions.

📖 Datasheet §10.1 — "The Watchdog Timer is clocked from a separate on-chip oscillator which runs at 128kHz"

7.2 — Why Its Own Oscillator?

The Watchdog runs on its own 128kHz oscillator completely independent from the main 16MHz system clock. Even if your code completely locks up in an infinite loop, crashes into invalid memory, or the main oscillator glitches — the watchdog timer keeps counting and resets the chip.

7.3 — What Can the Watchdog Do?

📖 Datasheet §10.2 — Watchdog Timer modes

Mode	What Happens When Timer Expires
Reset mode ← we use	Chip resets immediately. PC goes to 0x0000 or 0x7C00
Interrupt mode	Fires an interrupt. Your ISR handles it. Code keeps running.
Both	First fires interrupt. If not cleared → then resets.

7.4 — Watchdog Timeout Periods

📖 Datasheet §10.3, Table 10-2 — Watchdog Timer prescale select

WDP bits	Timeout	Use Case
000	16 ms	Very tight safety loop
101	500 ms
110	1 sec	← we use this (upload window)
111	2 sec	Most common safety timeout

7.5 — Why We Need It In Our Bootloader

Without a timeout, our uart_receive() waits forever. If nobody connects, the bootloader is stuck and your sketch never runs. The watchdog timer gives us a 1-second window: if Avrdude connects, we disable the watchdog and proceed with upload. If nobody connects, the watchdog fires, chip resets, we detect WDRF and jump to the app immediately.

7.6 — Detecting a Watchdog Reset (MCUSR)

📖 Datasheet §8.4 — MCUSR register — tells us WHY the chip reset. WDRF (bit 3) is set when watchdog caused the reset.

/* Very first thing in main() */

uint8_t mcusr = MCUSR; // save reset reason

MCUSR = 0; // clear all flags

if (mcusr & (1 << WDRF))

{

/* Watchdog reset → skip to app */

jump_to_app();

}

7.7 — Why We Cannot Just goto 0x0000

Problem 1: The watchdog keeps running. The app never pets it, watchdog fires after 1 second, chip resets, bootloader runs again — infinite loop.

Problem 2: Dirty hardware state. The bootloader has initialized UART and modified hardware state. The app inherits all that instead of a clean reset state and may break.

7.8 — The Correct Way: Watchdog Reset Trick

Instead of jumping → use the watchdog to RESET the chip!

1. Set watchdog to shortest timeout (16ms)

2. Do nothing (don't pet it)

3. Watchdog fires after 16ms

4. Chip fully resets — clean hardware state!

5. Bootloader runs again

6. Sees WDRF flag in MCUSR → jump to 0x0000 immediately

7. App runs with perfectly clean hardware state ✅

📖 Datasheet §10.8 — Watchdog System Reset Mode

7.9 — The Code

#include <avr/wdt.h>

/* Enable 1 second watchdog */

void watchdog_enable_1s(void)

{

wdt_enable(WDTO_1S);

}

/* Safe jump to application */

void jump_to_app(void)

{

wdt_enable(WDTO_15MS); // shortest timeout

while(1); // sit and wait for reset

/* After ~16ms → RESET → clean hardware state */

/* Bootloader restarts → sees WDRF → jumps to 0x0000 */

}

/* Disable watchdog once upload starts */

void watchdog_disable(void)

{

wdt_reset();

wdt_disable();

}

7.10 — Chapter 7 Summary

Concept	Key Point
Watchdog	Independent 128kHz hardware timer
Purpose	Resets chip if code hangs or crashes
Our use	1 second upload window timeout
MCUSR	Tells us WHY the chip reset
WDRF flag	Set when watchdog caused the reset
jump_to_app()	Set WDT to 16ms, loop, let it reset cleanly
Disable WDT	Call once Avrdude connects — upload takes time
Clean reset	Watchdog reset gives app a clean hardware state

Chapter 8: The Complete Bootloader

8.1 — The Complete main.c

Create this file at C:\AVR_Bootloader\src\main.c

* ATmega328P Bootloader

* Compatible with Arduino IDE (STK500v1 / Avrdude)

* Application : 0x0000 - 0x7BFF (31,744 bytes)

* Bootloader : 0x7C00 - 0x7FFF (1,024 bytes)

* HFUSE = 0xDA LOCK = 0xEF UART = 115200 8N1

#include <avr/io.h>

#include <avr/boot.h>

#include <avr/pgmspace.h>

#include <avr/interrupt.h>

#include <avr/wdt.h>

#include <stdint.h>

#define BOOT_START 0x7C00

#define PAGE_SIZE 128

#define F_CPU 16000000UL

#define BAUD 115200UL

#define UBRR_VAL ((F_CPU / (16UL * BAUD)) - 1)

#define STK_OK 0x10

#define STK_INSYNC 0x14

#define STK_NOSYNC 0x15

#define STK_CRC_EOP 0x20

#define STK_GET_SYNC 0x30

#define STK_GET_PARAMETER 0x41

#define STK_SET_DEVICE 0x42

#define STK_SET_DEVICE_EXT 0x45

#define STK_ENTER_PROGMODE 0x50

#define STK_LEAVE_PROGMODE 0x51

#define STK_LOAD_ADDRESS 0x55

#define STK_PROG_PAGE 0x64

#define STK_READ_PAGE 0x74

#define STK_READ_SIGN 0x75

#define SIGNATURE_0 0x1E

#define SIGNATURE_1 0x95

#define SIGNATURE_2 0x0F

static uint8_t page_buffer[PAGE_SIZE];

static uint16_t address = 0;

/* ── UART ── */

void uart_init(void) {

UBRR0H = (uint8_t)(UBRR_VAL >> 8);

UBRR0L = (uint8_t)(UBRR_VAL);

UCSR0B = (1 << RXEN0) | (1 << TXEN0);

UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

void uart_send(uint8_t b) {

while (!(UCSR0A & (1 << UDRE0)));

UDR0 = b;

}

uint8_t uart_receive(void) {

while (!(UCSR0A & (1 << RXC0)));

return UDR0;

}

/* ── STK500v1 helper ── */

void get_sync(void) {

uint8_t eop = uart_receive();

if (eop != STK_CRC_EOP) uart_send(STK_NOSYNC);

}

/* ── Flash self-programming ── */

void write_flash_page(uint16_t word_addr, uint8_t *data, uint16_t len) {

if (word_addr >= (BOOT_START / 2)) return;

uint32_t byte_addr = (uint32_t)word_addr * 2;

boot_page_erase(byte_addr); boot_spm_busy_wait();

for (uint16_t i = 0; i < len; i += 2) {

uint16_t word = data[i] | ((uint16_t)data[i+1] << 8);

boot_page_fill(byte_addr + i, word);

}

boot_page_write(byte_addr); boot_spm_busy_wait();

boot_rww_enable();

}

/* ── Jump to application ── */

void jump_to_app(void) {

wdt_enable(WDTO_15MS);

while(1);

}

/* ── Main ── */

int main(void) {

uint8_t mcusr = MCUSR;

MCUSR = 0;

if (mcusr & (1 << WDRF)) {

wdt_disable();

((void (*)(void))0)(); // jump to 0x0000

}

wdt_enable(WDTO_1S);

uart_init();

while (1) {

uint8_t cmd = uart_receive();

switch (cmd) {

case STK_GET_SYNC:

wdt_disable();

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK);

break;

case STK_GET_PARAMETER: {

uint8_t p = uart_receive(); get_sync();

uart_send(STK_INSYNC);

if (p == 0x80) uart_send(0x02);

else if (p == 0x81) uart_send(0x01);

else uart_send(0x00);

uart_send(STK_OK); break; }

case STK_SET_DEVICE:

for (uint8_t i=0;i<20;i++) uart_receive();

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK); break;

case STK_SET_DEVICE_EXT:

for (uint8_t i=0;i<5;i++) uart_receive();

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK); break;

case STK_ENTER_PROGMODE:

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK); break;

case STK_LOAD_ADDRESS: {

uint16_t lo = uart_receive();

uint16_t hi = uart_receive();

address = (hi << 8) | lo;

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK); break; }

case STK_PROG_PAGE: {

uint16_t len = ((uint16_t)uart_receive()<<8)|uart_receive();

uint8_t type = uart_receive();

for (uint16_t i=0;i<len;i++) page_buffer[i]=uart_receive();

get_sync();

if (type=='F') write_flash_page(address,page_buffer,len);

uart_send(STK_INSYNC); uart_send(STK_OK); break; }

case STK_READ_PAGE: {

uint16_t len = ((uint16_t)uart_receive()<<8)|uart_receive();

uint8_t type = uart_receive(); get_sync();

uart_send(STK_INSYNC);

if (type=='F')

for (uint16_t i=0;i<len;i++)

uart_send(pgm_read_byte((uint32_t)(address*2)+i));

uart_send(STK_OK); break; }

case STK_READ_SIGN:

get_sync();

uart_send(STK_INSYNC);

uart_send(SIGNATURE_0);

uart_send(SIGNATURE_1);

uart_send(SIGNATURE_2);

uart_send(STK_OK); break;

case STK_LEAVE_PROGMODE:

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK);

jump_to_app(); break;

default:

get_sync();

uart_send(STK_INSYNC); uart_send(STK_OK); break;

}

return 0;

}

8.2 — Build It

Open a command prompt in C:\AVR_Bootloader\ and run build.bat. The critical check: Program must be under 1024 bytes. That is our bootloader section size.

8.3 — Verify the .hex File

Open build\bootloader.hex in Notepad. The first line should show address 7C00:

:10 7C00 00 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx

▲

└── 7C00 = our bootloader start address ✅

If you see 0000 here — the linker flag in build.bat is not working.

8.4 — Flash It

Connect your USBasp and run flash.bat. It will set fuses, flash the bootloader, then set lock bits in that order.

8.5 — Test It

Test 1 — Timeout Works

Power on the board with no USB-Serial connected. Wait 2-3 seconds. Your existing sketch should run. This confirms the 1-second watchdog timeout and jump-to-app are working correctly.

Test 2 — Arduino IDE Upload Works

Open Arduino IDE, select Arduino Uno board, select the correct COM port, open the Blink sketch, and hit Upload. You should see bytes written and verified, then the LED starts blinking immediately.

8.6 — Complete Project Structure

C:\AVR_Bootloader\

├── src\

│ └── main.c ← the bootloader source

├── build\

│ ├── bootloader.elf ← compiled binary (intermediate)

│ └── bootloader.hex ← final hex file (flashed to chip)

├── build.bat ← compiles main.c → bootloader.hex

└── flash.bat ← sets fuses, flashes hex, sets lock bits

8.7 — Complete Tutorial Summary

Chapter	What We Built
1 — How the chip boots	Memory map, BOOTRST fuse, two upload scenarios, startup flow
2 — Fuses & memory	BOOTSZ=512 words, start=0x7C00, HFUSE=0xDA, lock bits=0xEF
3 — Toolchain	Build and flash batch scripts for Windows using Arduino IDE tools
4 — UART	uart_init, uart_send, uart_receive — 3 registers, baud rate math
5 — STK500v1 protocol	10 commands, handshake, address loading, data transfer
6 — Flash self-programming	Erase, fill, write, 128 byte pages, boot.h macros
7 — Watchdog & safe jump	1 second window, WDRF detection, clean app handoff
8 — Complete bootloader	Everything assembled, built, flashed and tested

✅ Tutorial Complete!

You now have a fully working Optiboot-style bootloader written from scratch, with every line of code tied back to the ATmega328P datasheet.