Wednesday, June 3, 2026

OFFENSIVE SECURITY / MALWARE ANALYSIS / REVERSE ENGINEERING Concept Reference List — Complete Edition


================================================================
  RECOMMENDED LEARNING PATH
================================================================
  1.  C Programming
  2.  Assembly (x86/x64)
  3.  PE Format & Windows Internals
  4.  Debugging & Dynamic Analysis
  5.  Reverse Engineering
  6.  Shellcode Engineering
  7.  Exploit Development
  8.  Malware Internals & Code Injection
  9.  EDR Evasion Concepts
  10. Kernel Mode Programming
  11. Active Directory Tradecraft
  12. Firmware / Hypervisor Research

================================================================
  SECTION A — FOUNDATIONS
================================================================

----------------------------------------------------------------
A1. PE FILE INTERNALS
----------------------------------------------------------------
- DOS Header / NT Headers
  Every PE starts with IMAGE_DOS_HEADER (MZ magic), then IMAGE_NT_HEADERS
  containing the file and optional headers

- Section Headers & Alignment
  .text (code), .data, .rdata, .rsrc — each has raw vs virtual alignment

- Import Table (IAT / INT)
  List of DLLs and functions the binary needs; resolved by the loader at startup

- Export Table
  Functions a DLL exposes to callers; has name, ordinal, and address arrays

- Relocations
  Base relocation table used when image can't load at preferred base address

- TLS Callbacks
  Thread Local Storage callbacks run BEFORE the entry point — common anti-debug
  trick since many debuggers break at EP, not TLS

- Delayed Imports
  Imports resolved lazily at first call rather than at load time

- Forwarded Exports
  An export that redirects to a function in another DLL
  (e.g., kernel32!Beep -> kernelbase!Beep)

- Resource Section (.rsrc)
  Embedded resources: icons, strings, version info, and sometimes payloads

- Manual Mapping
  Parsing and loading a PE by hand: map sections, fix relocations, resolve IAT,
  call TLS callbacks, then call entry point — foundation of reflective loading

- Relocation Fixups
  Patching absolute addresses when image loads at a different base than preferred


----------------------------------------------------------------
A2. WINDOWS INTERNALS
----------------------------------------------------------------
- Object Manager
  Kernel subsystem managing all named/unnamed kernel objects
  (files, events, mutexes, processes, threads)

- Handle Tables
  Per-process table mapping handle values to kernel object pointers

- Access Tokens & Security Reference Monitor (SRM)
  Tokens carry user SID, group SIDs, privileges; SRM enforces access checks

- ALPC (Advanced Local Procedure Call)
  High-performance IPC mechanism used internally by Windows (replaces LPC)

- Executive & Kernel layers
  HAL -> Kernel -> Executive (Ob, Mm, Io, Se, Ps, etc.) -> Subsystems

- Virtual Memory Manager (VMM)
  Manages VADs, page tables, working sets, paged/non-paged pool

- I/O Manager & IRP
  Manages driver stack communication via I/O Request Packets

- Session & Desktop isolation
  Sessions separate user contexts; desktops isolate window stations


----------------------------------------------------------------
A3. SHELLCODE ENGINEERING
----------------------------------------------------------------
- Position-Independent Code (PIC)
  Code that works regardless of where it's loaded — no hardcoded addresses;
  uses delta offsets or GetPC techniques

- GetPC Techniques
  Getting the current instruction pointer value at runtime
  (e.g., CALL/POP trick, LEA RIP-relative on x64)

- Null-Byte Avoidance
  Many injection vectors treat 0x00 as string terminator; shellcode must
  avoid null bytes through instruction substitution

- Encoder / Decoder Stubs
  XOR, ROT, or custom encoders wrap shellcode; decoder runs first,
  decodes in-place, then jumps to payload

- Syscall Shellcode
  Shellcode that invokes syscalls directly without relying on API stubs

- Alphanumeric Shellcode
  Shellcode restricted to printable ASCII characters — bypasses filters
  that only allow text input

- Egg Hunters
  Small shellcode that searches process memory for a unique tag (egg)
  preceding the real payload — useful when injection space is limited

- Staged vs Stageless Payloads
  Stageless: entire payload in one blob
  Staged: small stager downloads and executes the real payload from a C2

- Stack Pivoting
  Redirect the stack pointer (RSP/ESP) to attacker-controlled memory
  to enable ROP chain execution

- ROP Chains (Return-Oriented Programming)
  Chain together existing code "gadgets" (ending in RET) to execute
  arbitrary logic without injecting new code — bypasses DEP/NX


================================================================
  SECTION B — EXPLOITATION
================================================================

----------------------------------------------------------------
B1. EXPLOIT DEVELOPMENT
----------------------------------------------------------------
- Buffer Overflow (Stack)
  Overwrite return address on the stack to redirect execution

- Buffer Overflow (Heap)
  Corrupt heap metadata or adjacent allocations to gain control

- Use-After-Free (UAF)
  Access memory after it has been freed; if reallocated with attacker
  data, leads to type confusion or code execution

- Heap Corruption
  Corrupt allocator metadata (free lists, chunk headers) to redirect writes

- Format String Vulnerabilities
  Uncontrolled format strings (%n, %x) allow arbitrary read/write

- Integer Overflows / Underflows
  Arithmetic wrapping leads to incorrect size calculations and
  exploitable allocations

- Race Conditions (TOCTOU)
  Time-of-check vs time-of-use: win a race between check and use
  to substitute a different resource

- DEP / NX Bypass
  Data Execution Prevention marks memory non-executable;
  bypassed via ROP, ret2libc, or JIT spraying

- ASLR Bypass
  Address Space Layout Randomization randomized base addresses;
  bypassed via info leaks, partial overwrites, heap spraying, or brute force

- ROP / JOP / COP
  Return/Jump/Call Oriented Programming — code reuse attack variants

- Heap Feng Shui
  Carefully shape heap layout to place attacker data adjacent to
  target structures before triggering a vulnerability

- SEH Exploitation (Windows)
  Overwrite Structured Exception Handler chain to redirect execution
  on exception

- Browser Exploitation Concepts
  JIT compiler abuse, sandbox escapes, type confusion in JS engines,
  renderer vs browser process privilege separation

- Kernel Exploitation Basics
  NULL pointer dereference, pool overflows, race conditions in drivers,
  token stealing shellcode to escalate to SYSTEM


================================================================
  SECTION C — MALWARE INTERNALS
================================================================

----------------------------------------------------------------
C1. PROCESS & MEMORY INTERNALS
----------------------------------------------------------------
- Process Hollowing
  Spawn a legit process suspended, hollow out its memory, replace with payload

- Process Doppelganging
  Use NTFS transactions to load a modified executable without touching disk

- Process Herpaderping
  Map an executable image, modify it on disk after mapping but before
  section validation — confuses scanners that scan from disk

- Process Ghosting
  Create a file, mark it for deletion, map it as an image, then run it —
  appears to run from an already-deleted file

- PEB Walking
  Manually find loaded modules via the Process Environment Block (no API calls)

- VAD Manipulation
  Tamper with Virtual Address Descriptors to hide memory regions

- Page Table Manipulation
  Directly manipulate page tables at a lower level than VAD tricks

- Heap Spraying
  Fill heap with shellcode to increase odds of hitting it on overflow

- Pool Spraying
  Kernel-mode equivalent of heap spraying; targets kernel pool allocations

- EXE Packing (Custom Packer)
  Compress/encrypt an executable; stub decompresses and runs it at runtime

- DLL Memory Loading (Reflective DLL Injection)
  Load a DLL from a byte buffer in memory instead of from disk

- Thread Hijacking
  Suspend an existing thread, redirect its instruction pointer, resume it

- Memory Patching
  Overwrite bytes in a running process to change its behavior


----------------------------------------------------------------
C2. HOOKING TECHNIQUES
----------------------------------------------------------------
- Inline Hooking
  Patch first 5 bytes of a function with a JMP to your handler

- Trampoline Hooks
  Inline hook that also preserves and calls the original function

- Detours-style Hook
  Microsoft Detours approach — robust inline hook with trampoline

- IAT Hooking
  Replace function pointers in the Import Address Table

- VTable Hooking
  Overwrite C++ virtual function table pointers

- GOT/PLT Hooking (Linux)
  Overwrite Global Offset Table entries to redirect function calls

- SSDT Hooking
  Hook the kernel's System Service Descriptor Table (kernel mode)

- Kernel Callback Hooking
  Tamper with PsSetCreateProcessNotifyRoutine and similar callbacks
  to blind EDR/AV kernel drivers

- IRP Hooking
  Hook I/O Request Packets in kernel drivers

- SYSENTER / SYSCALL Hooking
  Modify MSRs to intercept syscall entry point


----------------------------------------------------------------
C3. CODE INJECTION TECHNIQUES
----------------------------------------------------------------
- Classic DLL Injection
  WriteProcessMemory + CreateRemoteThread -> LoadLibrary

- APC Injection
  Queue an Async Procedure Call to a thread's APC queue

- Early Bird Injection
  Inject via APC before the process fully initializes

- SetThreadContext Injection
  Redirect a suspended thread's context registers to shellcode

- Fiber Injection
  Hijack user-mode fibers to execute code inside a target process

- Transacted Hollowing
  Variant of Doppelganging using TxF (Transactional NTFS)

- Heaven's Gate
  Switch from 32-bit to 64-bit mode mid-execution to bypass hooks

- Atom Bombing
  Use Windows global atom tables as a data smuggling channel

- ptrace Injection (Linux)
  Use ptrace() syscall to read/write memory and registers of a process

- LD_PRELOAD Hijacking (Linux)
  Force a process to load your shared library before all others


----------------------------------------------------------------
C4. EVASION & ANTI-ANALYSIS
----------------------------------------------------------------
- API Unhooking
  Restore ntdll from a clean copy to remove AV/EDR hooks

- Direct Syscalls
  Invoke syscalls by number, bypassing hooked user-mode API stubs

- Indirect Syscalls
  JMP into ntdll's syscall instruction to avoid non-module execution

- Syscall Stomping
  Overwrite an unused syscall stub with your own to blend in

- Unhooking via KnownDlls Cache
  Load clean ntdll from the KnownDlls section object

- ETW Patching
  Patch ETW to blind event logging and telemetry

- Call Stack Spoofing / Return Address Spoofing
  Fake the call stack to hide the real caller from EDR stack walking

- Sleep Obfuscation
  Encrypt shellcode in memory while sleeping to evade memory scanning

- Stack Encryption
  Encrypt the stack during sleep/wait periods

- Gargoyle Memory Hiding
  Mark shellcode as non-executable while not running; flip back on timer

- Timing Attacks / Sleep Skipping Detection
  Detect sandbox time acceleration; behave benignly when detected

- PPID Spoofing
  Fake the parent process ID of a spawned process

- Misleading Disassembly
  Insert junk bytes or overlapping instructions to fool disassemblers

- Hardware Breakpoint Detection
  Scan Dr0-Dr7 registers to detect hardware breakpoints

- AMSI Bypass
  Patch or tamper with the Antimalware Scan Interface to blind
  script-based detection


================================================================
  SECTION D — PRIVILEGE & CREDENTIALS
================================================================

----------------------------------------------------------------
D1. CREDENTIAL & PRIVILEGE TECHNIQUES
----------------------------------------------------------------
- Token Impersonation
  Steal/duplicate another process's access token

- Pass-the-Hash
  Authenticate using an NTLM hash without the plaintext password

- LSASS Dumping
  Extract credential material from LSASS process memory

- DPAPI Abuse
  Decrypt Chrome cookies, WiFi passwords, Windows credentials via
  CryptProtectData / CryptUnprotectData

- Kerberoasting
  Request TGS tickets for SPNs and crack service account passwords offline

- Golden Ticket
  Forge a Kerberos TGT using the KRBTGT hash — full domain access

- Silver Ticket
  Forge a TGS for a specific service without touching the DC

- Shadow Credentials
  Add key credentials to an AD object as a stealthy backdoor

- Skeleton Key
  Patch LSASS to accept a universal master password

- UAC Bypass
  Escalate to high-integrity without a UAC prompt

- ACL Abuse
  Exploit weak permissions on registry keys, services, or files


================================================================
  SECTION E — ACTIVE DIRECTORY TRADECRAFT
================================================================

----------------------------------------------------------------
E1. AD ATTACKS & ABUSE
----------------------------------------------------------------
- DCSync
  Impersonate a DC to request password hashes via MS-DRSR replication protocol

- DCShadow
  Register a rogue DC temporarily to push malicious AD changes

- BloodHound Graph Abuse
  Use BloodHound-collected AD relationship data to find attack paths
  to Domain Admin

- Constrained Delegation Abuse
  Abuse services allowed to delegate to specific targets to impersonate users

- Resource-Based Constrained Delegation (RBCD)
  Write msDS-AllowedToActOnBehalfOfOtherIdentity to gain impersonation rights

- NTLM Relay
  Capture and relay NTLM authentication to authenticate to other services

- PetitPotam
  Coerce a DC to authenticate to an attacker via MS-EFSRPC — feeds NTLM relay

- PrinterBug (SpoolSample)
  Abuse the Print Spooler to coerce DC authentication

- Zerologon (CVE-2020-1472)
  Cryptographic flaw in Netlogon — set DC machine account password to empty

- AdminSDHolder Abuse
  Modify AdminSDHolder ACL to propagate permissions to protected groups

- SID History Abuse
  Add high-priv SID to a user's SID history as a stealthy backdoor

- Kerberos Delegation (Unconstrained)
  Machines with unconstrained delegation store TGTs — coerce DC auth to steal it


================================================================
  SECTION F — DEFENSIVE INTERNALS & EDR CONCEPTS
================================================================

----------------------------------------------------------------
F1. EDR / DETECTION ENGINEERING INTERNALS
----------------------------------------------------------------
- AMSI (Antimalware Scan Interface)
  Windows API that allows AV/EDR to inspect script content
  (PowerShell, VBScript, JScript) before execution

- ETW (Event Tracing for Windows) Providers & Consumers
  Kernel and user-mode components emit structured events;
  EDRs subscribe to security-relevant providers for telemetry

- ETWTI (ETW Threat Intelligence)
  ETW provider specifically for kernel-level process/thread telemetry
  used by modern EDRs; harder to blind than user-mode hooks

- Sysmon Internals
  Sysinternals tool using kernel callbacks and ETW to log process
  creation, network, registry, file, and driver events

- Userland vs Kernel Telemetry
  Userland (IAT/inline hooks on ntdll) vs kernel (callbacks, ETW, minifilters)
  — kernel telemetry is far harder to evade

- Minifilter Drivers
  Kernel drivers that attach to the filter manager to intercept file I/O;
  used by AV/EDR to scan files on access

- Kernel Callbacks
  PsSetCreateProcessNotifyRoutine, PsSetLoadImageNotifyRoutine,
  CmRegisterCallback — EDRs use these for visibility; malware tries to remove them

- CFG (Control Flow Guard)
  Compiler+OS mitigation: validates indirect call targets against a bitmap
  of valid function entry points

- CET / Hardware Shadow Stack
  Intel CET pushes return addresses to a separate shadow stack protected
  by hardware; defeats ROP chains that corrupt the normal stack

- PatchGuard (KPP)
  Kernel Patch Protection: periodically checks integrity of SSDT, IDT,
  GDT, and other kernel structures; BSODs on tampering

- HVCI / VBS (Hypervisor-Protected Code Integrity / Virtualization Based Security)
  Uses a hypervisor to isolate the kernel credential store and enforce
  code integrity — makes unsigned kernel code execution nearly impossible

- Protected Process Light (PPL)
  Restricts which processes can open handles to sensitive processes
  (like LSASS) with certain access rights

- LSASS Protection
  RunAsPPL registry key makes LSASS a protected process;
  requires a signed driver to dump it


================================================================
  SECTION G — REVERSE ENGINEERING
================================================================

----------------------------------------------------------------
G1. REVERSE ENGINEERING SKILLS
----------------------------------------------------------------
- Static Analysis
  Reading disassembly without running it (IDA Pro, Ghidra, Binary Ninja)

- Dynamic Analysis
  Running under a debugger (x64dbg, WinDbg)

- Anti-Debug Tricks
  IsDebuggerPresent, NtQueryInformationProcess, timing checks, TLS callbacks

- Hardware Breakpoint Detection
  Detect debuggers via debug register inspection (Dr0-Dr7)

- Unpacking
  Extracting real payload from a packed/compressed executable

- Deobfuscation
  Recovering readable code from obfuscated or encrypted samples

- Binary Patching
  Modifying compiled binaries to change behavior

- Binary Diffing
  Comparing two binary versions to find changes (Diaphora, BinDiff)
  — essential for patch analysis and 1-day research

- Emulation / Unicorn Engine
  Run shellcode in an emulated CPU without a full OS environment

- Taint Tracking / Symbolic Execution
  Track attacker-controlled data flow through a binary (Angr, Triton)

- Debugger Scripting
  Automate analysis with IDAPython, x64dbg's Python API, WinDbg JS


================================================================
  SECTION H — LINUX & CROSS-PLATFORM
================================================================

----------------------------------------------------------------
H1. LINUX TECHNIQUES
----------------------------------------------------------------
- ptrace Injection
  Linux syscall for process inspection/control; abuse for code injection

- LD_PRELOAD Hijacking
  Force a process to load your shared library before system libraries;
  override functions like read(), write(), getuid()

- GOT / PLT Hooking
  Overwrite Global Offset Table to redirect function calls in ELF binaries

- ELF Internals
  ELF header, program headers, section headers, dynamic segment,
  symbol tables — Linux equivalent of PE format knowledge

- /proc Manipulation
  /proc/[pid]/mem for reading/writing process memory;
  /proc/[pid]/maps for layout; used in Linux injection techniques

- eBPF Rootkits
  Extended Berkeley Packet Filter programs run in kernel context;
  can hook syscalls and hide processes/network connections

- Linux Capabilities Abuse
  Fine-grained privilege system (CAP_SYS_ADMIN, CAP_NET_RAW, etc.)
  — misconfigurations lead to container escapes and privilege escalation

- cron / systemd Persistence
  Classic persistence via crontab entries or malicious systemd units


================================================================
  SECTION I — PERSISTENCE MECHANISMS
================================================================

- Registry Run Keys
  HKCU\Software\Microsoft\Windows\CurrentVersion\Run

- Scheduled Tasks
  Via COM or XML; survive reboots

- COM Hijacking
  Replace a legitimate COM object with your own DLL

- DLL Proxying / DLL Side-Loading
  Malicious DLL named to match what a legit app expects; forward real exports

- WMI Subscriptions
  Trigger payloads on system events

- Boot/Login Scripts via GPO
  Scripts in SYSVOL executed at boot/login

- SID History Abuse
  Add high-priv SID to user's history as a stealthy backdoor

- SIH Abuse
  Abuse Windows maintenance scheduled tasks

- Boot/Pre-OS (Bootkit)
  MBR/VBR level persistence


================================================================
  SECTION J — FIRMWARE & HARDWARE
================================================================

- UEFI Bootkit
  Persist in SPI flash firmware (LoJax, CosmicStrand) — survives reinstalls

- SMM (System Management Mode) Rootkit
  Executes in SMRAM, invisible to OS; triggered by SMIs

- PCIe DMA Attacks
  Read/write host memory via PCIe/Thunderbolt without CPU (PCILeech)

- ACPI Table Tampering
  Embed malicious code in custom ACPI methods


================================================================
  SECTION K — HYPERVISOR & VM CONCEPTS
================================================================

- VM Exits
  Conditions that cause a guest VM to trap back to the hypervisor (VMM);
  hypervisors monitor sensitive instructions via VM exits

- EPT Hooking (Extended Page Tables)
  Hook guest physical memory mappings at the hypervisor level —
  invisible to the guest OS; used in stealth monitors and rootkits

- Blue Pill Rootkit Concept
  Transparently insert a hypervisor under a running OS; OS is unaware
  it's now a VM guest

- Hypervisor Introspection (VMI)
  Inspect guest VM memory and state from the hypervisor without
  touching the guest — powerful for transparent monitoring

- Intel VT-x Internals
  VMX root/non-root operation, VMCS fields, VMLAUNCH/VMRESUME,
  EPT, VPID — foundational for building a hypervisor

- CPUID Fingerprinting
  Detect virtualization via CPUID hypervisor bit and vendor strings

- Timing-Based VM Detection
  RDTSC delta differences between bare metal and VM environments

- SGX Enclaves
  Intel Software Guard Extensions — isolated encrypted memory regions
  even the OS/hypervisor can't read; used for secrets and anti-analysis

- TPM Abuse Concepts
  Trusted Platform Module sealing/unsealing secrets tied to platform state;
  research into PCR manipulation and TPM-based malware resilience


================================================================
  SECTION L — NETWORK, C2 & TRAFFIC EVASION
================================================================

- C2 Protocol Mimicry
  Disguise traffic as: HTTPS, DNS, MS Graph API, Telegram, Slack, OneDrive

- JA3 / JA3S Fingerprinting
  Fingerprint TLS clients/servers from handshake parameters;
  EDRs/NDRs use this to identify C2 tools

- JARM Fingerprint Spoofing
  Manipulate active TLS fingerprint to avoid C2 server identification

- HTTP/2 C2
  Use HTTP/2 multiplexing to blend C2 traffic into normal web traffic

- QUIC-Based Transport
  UDP-based protocol; harder to inspect than TCP/TLS streams

- Domain Fronting
  Route C2 through a CDN; largely mitigated, replaced by CDN impersonation

- Dead Drop Resolvers
  Store C2 address in a public service (Twitter, Pastebin, GitHub)
  so the real C2 IP never appears in the binary

- DGA (Domain Generation Algorithms)
  Algorithmically generate hundreds of domain names; only the attacker
  knows which one is registered today

- Fast Flux DNS
  Rapidly rotate IPs behind a C2 domain to evade IP blocklists

- Peer-to-Peer Botnets
  Decentralized C2 with no single point of failure; nodes relay commands

- Traffic Shaping
  Throttle and time C2 beacons to mimic normal user browser traffic

- Covert Channels
  Hide data in protocol fields not meant for data (DNS TXT, ICMP payload,
  HTTP headers, image steganography)

- C2 Over WebSocket / gRPC
  Modern protocol channels that blend naturally into enterprise traffic

- Living Off the Land (LOLBins)
  Use built-in Windows binaries to avoid dropping files:
  mshta, regsvr32, cscript, wmic, certutil, rundll32, msiexec, bitsadmin


================================================================
  SECTION M — ADVANCED RESEARCH TOPICS
================================================================

- DKOM (Direct Kernel Object Manipulation)
  Directly modify kernel structures (e.g., unlink a process from
  ActiveProcessLinks to hide it from task managers)

- Object Callbacks
  ObRegisterCallbacks — kernel mechanism for object open/duplicate
  notification; abused by anti-cheat and rootkits alike

- Heaven's Gate Variants
  Beyond 32->64 mode switch: variants for syscall table switching
  and wow64 layer abuse

- Gargoyle Memory Hiding
  Execute shellcode, then mark it non-executable and hide it in heap;
  re-arm via timer to re-execute later

- Sleep Obfuscation Techniques
  Encrypt implant in memory during sleep: Ekko, Foliage, Cronos variants

- Stack Encryption
  XOR or AES the stack during wait periods to evade memory scanning

- Return Address Spoofing
  Overwrite return addresses on the stack to fake call origin

- Intel VT-x Internals
  VMCS, EPT, VM exits — foundation for building custom hypervisors

- Kernel Patch Protection (PatchGuard) Internals
  How PatchGuard works: encrypted timer callbacks, integrity checks,
  randomized scheduling — and why bypassing it is extremely difficult

- ETWTI (ETW Threat Intelligence Provider)
  Kernel ETW provider emitting thread/process events used by modern EDRs;
  patching it requires kernel access and triggers PatchGuard


================================================================
  SECTION N — LEARNING RESOURCES
================================================================

Courses:
  - OSCP   (Offensive Security Certified Professional)
  - OSED   (Offensive Security Exploit Developer)
  - CRTO   (Certified Red Team Operator)
  - CRTE   (Certified Red Team Expert — AD focused)
  - Sektor7 Malware Development (intro + intermediate + rootkits)
  - SANS FOR610  (Reverse Engineering Malware)
  - SANS SEC760  (Advanced Exploit Development)
  - TCM Security Malware Analysis Courses

Books:
  - The Shellcoder's Handbook
  - Practical Malware Analysis (Sikorski & Honig)
  - Windows Internals Parts 1 & 2 (Russinovich et al.)
  - The Art of Memory Forensics
  - Rootkits: Subverting the Windows Kernel
  - Hacking: The Art of Exploitation (Erickson)
  - The Web Application Hacker's Handbook

Disassemblers / Decompilers:
  - IDA Pro            (industry standard)
  - Ghidra             (free, NSA open-source)
  - Binary Ninja       (scriptable, modern UI)
  - Cutter / Rizin     (free open-source)

Debuggers:
  - x64dbg             (Windows user-mode)
  - WinDbg / WinDbg Preview  (kernel + user-mode)
  - GDB + pwndbg/peda  (Linux)

Dynamic Instrumentation:
  - Frida              (scriptable, cross-platform)
  - DynamoRIO          (binary translation framework)
  - PIN (Intel)        (x86 instrumentation)

System Inspection:
  - Process Hacker / System Informer
  - Process Monitor (ProcMon)
  - API Monitor

Network Analysis:
  - Wireshark
  - Zeek / Bro
  - Fakenet-NG         (dynamic network analysis for malware)

Emulation / Symbolic Execution:
  - Unicorn Engine     (CPU emulation)
  - Angr               (symbolic execution)
  - Triton             (dynamic taint + symbolic)

Hardware / DMA:
  - PCILeech / MemProcFS

Practice Environments:
  - TryHackMe
  - HackTheBox
  - VulnHub
  - Any.run            (online sandbox)
  - MalwareBazaar      (real samples)
  - Flare-VM           (Windows RE environment)
  - REMnux             (Linux RE environment)
  - pwn.college        (exploit development)

================================================================
  NOTE: These concepts are for educational purposes —
  malware analysis, red teaming, CTFs, and security research.
  Always operate within legal boundaries and in authorized
  environments (your own lab, CTFs, bug bounty programs).
================================================================

Monday, June 1, 2026

Reverse Engineering: Understanding the Thoughts Behind Systems

 

A software program or hardware system is usually the result of people organizing ideas, logic, constraints, and decisions to solve a problem. The final product becomes a kind of “frozen thinking” expressed through:

  • code
  • circuit layouts
  • protocols
  • algorithms
  • mechanical structures
  • data formats
  • timing behavior
  • UI decisions
  • optimization tricks

So reverse engineering is often the process of working backward from the finished system to understand:

  • what problem the creators were solving
  • how the system works internally
  • why certain design decisions were made
  • what assumptions or constraints existed
  • how components interact

In software, that may involve:

  • studying binaries
  • analyzing assembly
  • tracing execution
  • reconstructing algorithms
  • understanding data structures

In hardware, it may involve:

  • tracing PCB connections
  • identifying chips
  • analyzing signals
  • reconstructing schematics
  • understanding timing and electrical behavior

So in a philosophical sense, reverse engineering can feel like “reading the engineers’ thought process” indirectly through the artifact they created.

But it’s important to understand a distinction:

You are not literally reading their thoughts — you are inferring them from evidence left behind in the design.

Sometimes those inferences are accurate.
Sometimes multiple different thought processes could produce the same result.

For example:

  • an unusual algorithm might reveal a performance optimization mindset
  • extra security checks may reveal concern about tampering
  • elegant modular design may show emphasis on maintainability
  • messy duplicated logic may show deadline pressure or rapid iteration

Experienced reverse engineers often become good at recognizing “engineering fingerprints”:

  • compiler patterns
  • coding styles
  • architectural habits
  • optimization strategies
  • hardware design conventions

In that sense, reverse engineering is partly technical analysis and partly reasoning about human design decisions.



Software Architectures for Arduino and Embedded Systems

 

1 - Monolithic Architecture

Monolithic architecture is the simplest and most common approach used in small Arduino projects. In this design, nearly all functionality is placed directly inside the main Arduino sketch using the setup() and loop() functions. Sensor reading, display handling, communication, and control logic are all processed sequentially inside a single program structure.

This architecture is easy to understand and requires minimal memory, making it suitable for beginners and small microcontrollers such as the Arduino Uno and Nano. However, as the project grows larger, the code can become difficult to maintain because all system components are tightly connected.


2 - Modular Architecture

Modular architecture divides the firmware into separate modules or source files, where each module handles a specific responsibility such as sensor management, display control, communication, or storage.

This approach improves code organization, readability, debugging, and reusability. Developers can modify one module without affecting the rest of the system significantly. Modular architecture is widely used in medium-sized Arduino and embedded projects because it provides better scalability compared to monolithic designs.


3 - Layered Architecture

Layered architecture organizes firmware into multiple logical layers. Common layers include application logic, middleware or services, hardware abstraction, drivers, and direct hardware interaction.

Each layer communicates with the layer directly below or above it. This structure improves portability and maintainability because hardware-specific code is separated from application logic. Layered architecture is common in professional embedded systems and advanced microcontroller frameworks.


4 - Event-Driven Architecture

Event-driven architecture is based on reacting to events instead of continuously checking every subsystem in sequence. Events may include button presses, timer expirations, sensor triggers, serial communication, or network messages.

When an event occurs, the firmware executes a corresponding handler function. This architecture improves responsiveness and is commonly used in menu systems, IoT devices, robotics, and automation systems.


5 - State Machine Architecture

State machine architecture organizes firmware behavior into defined states such as idle, running, paused, error, or sleep. The system transitions between these states depending on conditions or events.

This architecture provides predictable system behavior and simplifies debugging. State machines are widely used in robotics, automation controllers, industrial systems, and embedded devices that require clear operational flow.


6 - Finite State Machine (FSM)

A finite state machine is a formal implementation of a state machine where transitions between states are explicitly defined.

FSMs are commonly used in communication protocols, menu systems, LED animation controllers, and sequential process control because they provide clear and structured logic flow.


7 - Cooperative Multitasking

Cooperative multitasking simulates multitasking without using an operating system. The firmware is divided into multiple short tasks that execute repeatedly inside the main loop.

Each task must return quickly so other tasks can execute without delays. Timing is commonly handled using millis() instead of blocking functions such as delay(). This architecture is extremely popular in Arduino development.


8 - Scheduler-Based Architecture

Scheduler-based architecture uses a scheduler to determine when tasks should run. Tasks may execute periodically at fixed intervals such as every few milliseconds or seconds.

This approach simplifies timing management and improves organization in projects containing multiple timed operations. Scheduler libraries are commonly used in automation and sensor-based systems.


9 - RTOS Architecture

RTOS architecture uses a real-time operating system such as FreeRTOS to manage multitasking. Tasks run independently and may use priorities, queues, semaphores, and synchronization mechanisms.

This architecture enables true multitasking and is commonly used on advanced microcontrollers such as ESP32, STM32, and RP2040. RTOS systems are suitable for complex real-time applications but require more memory and system resources.


10 - Actor Architecture

Actor architecture divides the system into independent software actors that communicate through messages instead of shared variables.

Each actor processes information independently, improving modularity and concurrency handling. This architecture is more common in advanced embedded systems and multicore microcontrollers.


11 - Service-Oriented Architecture

Service-oriented architecture divides firmware into services such as networking, storage, display management, sensor processing, or LED control.

Each service provides specific functionality through defined APIs. This architecture improves separation of concerns and is commonly used in IoT firmware and smart device systems.


12 - Plugin Architecture

Plugin architecture allows features or modules to be added or removed independently. In Arduino systems, plugins are usually compile-time modules because smaller microcontrollers typically cannot load binary modules dynamically.

This architecture is common in configurable firmware such as LED effect systems and home automation controllers.


13 - Component-Based Architecture

Component-based architecture builds the system using reusable software components. Each component encapsulates its own functionality and interfaces.

This approach improves reusability and maintainability and is commonly used in robotics frameworks, GUI systems, and large embedded applications.


14 - Dataflow Architecture

Dataflow architecture organizes processing as a flow of data through multiple stages such as acquisition, filtering, transformation, and output.

This architecture is useful in digital signal processing, sensor fusion, audio processing, and data streaming systems because it clearly represents how information moves through the firmware.


15 - Interrupt-Driven Architecture

Interrupt-driven architecture uses hardware or software interrupts to respond immediately to important events such as timer overflows, UART communication, encoder pulses, or GPIO changes.

Interrupts improve responsiveness and timing precision. However, interrupt handlers must remain short and efficient to avoid system instability.


16 - Reactive Architecture

Reactive architecture continuously reacts to changing system conditions. Examples include responding to sensor thresholds, battery voltage changes, or communication events.

This architecture is widely used in automation systems, smart sensors, and adaptive embedded devices.


17 - Command Architecture

Command architecture processes commands received from serial communication, EEPROM, SD cards, filesystems, or network interfaces.

Commands may control LEDs, animations, settings, or device operations. This approach is useful in configurable firmware, scripting systems, and automation controllers.


18 - Pipeline Architecture

Pipeline architecture divides operations into sequential stages where data flows from one stage to another.

For example, a system may read data, decode it, process it, and display it in separate stages. This architecture is useful for streaming systems, binary processing, and LED animation engines.


19 - MVC Architecture

MVC stands for Model-View-Controller. This architecture separates application data, visual representation, and user interaction into different sections.

Although less common in small embedded systems, MVC is useful in touchscreen interfaces, menu-driven systems, and graphical user interfaces.


20 - Hardware Abstraction Layer (HAL)

A hardware abstraction layer provides generic interfaces for hardware operations while hiding low-level hardware details.

Instead of directly controlling registers or GPIO pins throughout the firmware, the application uses abstracted hardware functions. HAL improves portability and simplifies migration between different microcontroller platforms such as AVR, ESP32, STM32, and RP2040.



Saturday, May 16, 2026

ATmega328P Bootloader

A Step-by-Step Tutorial

Writing an Optiboot-style UART Bootloader from Scratch

For Windows | Arduino IDE 1.8.x | ATmega328P @ 16MHz


Chapter 1: How the ATmega328P Boots

 

1.1 — What is the ATmega328P at its Core?

The ATmega328P is an 8-bit microcontroller made by Microchip (formerly Atmel). It contains three separate memory types:

 

Memory

Size

Purpose

Survives Power Off?

Flash

32KB

Stores your program code

Yes

SRAM

2KB

Stores variables while running

No

EEPROM

1KB

Stores small persistent data

Yes

 

We only care about Flash for our bootloader. SRAM and EEPROM are not involved in booting.

 

📖 Datasheet §1 — "The ATmega328P is a low-power CMOS 8-bit microcontroller based on the AVR enhanced RISC architecture"

 

1.2 — The Program Counter (PC)

The CPU has a special internal register called the Program Counter (PC). It holds the address of the next instruction to execute. The CPU is a machine that does this millions of times per second:

 

loop forever:

    1. Read instruction at address stored in PC

    2. Execute that instruction

    3. Increment PC (or jump if instruction says so)

 

📖 Datasheet §6.3 — "The Program Counter (PC) is 14 bits wide, addressing the 16K word (32KB) program memory space"

 

⚠️ Note: PC counts in WORDS (2 bytes each). 2^14 = 16,384 words = 32,768 bytes = 32KB. You will see both word and byte addresses in the datasheet.

 

1.3 — The Flash Memory Map in Detail

The full 32KB Flash with real addresses:

 

BYTE ADDRESS    CONTENT

┌────────────────────────────────────────────┐

│ 0x0000    ← RESET VECTOR                  

│ 0x0002    ← INT0 vector                   

│ 0x0004    ← INT1 vector                   

│ ...          other interrupt vectors       

├────────────────────────────────────────────┤

                                           

         APPLICATION SECTION              

         (your sketch / program)          

                                           

├────────────────────────────────────────────┤

│ 0x7C00   ← BOOT START (BOOTSZ=512 words) 

         BOOTLOADER SECTION               

         (our code lives here)            

│ 0x7FFF   ← TOP OF FLASH                 

└────────────────────────────────────────────┘

         Total: 32,768 bytes (32KB)

 

📖 Datasheet §27.5, Table 27-5 — Defines boot section sizes and start addresses

 

1.4 — What Happens in the First Nanoseconds After Reset?

Things that cause a Reset: Power on (VCC rises), RESET pin pulled low, Watchdog Timer timeout, Brown-out detection, Software reset via watchdog.

 

📖 Datasheet §8.1 — "The ATmega328P provides several reset sources"

 

Regardless of reset source, the sequence is always:

 

Reset occurs

    

    

CPU initializes internally (registers cleared)

    

    

┌─────────────────────────────────────────────┐

   CPU checks: Is BOOTRST fuse programmed?  

└─────────────────────────────────────────────┘

                              

         NO                    YES

                               

                              

   PC = 0x0000           PC = 0x7C00

   Your app runs         Bootloader runs

 

📖 Datasheet §27.4 — "If the BOOTRST Fuse is programmed, the reset vector is pointing to the Boot Flash start address"

 

1.5 — The BOOTRST Fuse Bit

Fuse bits are configuration bits stored outside Flash in dedicated hardware. They survive power cycling and are written with a programmer like USBasp.

 

⚠️ AVR Fuse Convention: Bit = 1 means UNPROGRAMMED (not active, factory default). Bit = 0 means PROGRAMMED (active). So BOOTRST = 0 means bootloader IS active. This inverted logic confuses everyone at first!

 

1.6 — The Two Upload Scenarios

Scenario A: USBasp Without a Bootloader

The USBasp talks SPI directly to the chip hardware (ICSP header). It writes raw bytes to Flash starting at 0x0000. No bootloader is needed or involved. Even a completely blank chip can be programmed this way. BOOTRST fuse is NOT set, so CPU starts at 0x0000 and your code runs immediately.

 

Scenario B: Arduino IDE Upload (With Bootloader)

The Arduino IDE runs Avrdude which talks to the bootloader running on the chip over UART. A DTR pin pulse triggers a hardware reset, the chip resets with BOOTRST set so PC goes to 0x7C00, the bootloader starts, waits for Avrdude, receives the sketch over UART, writes it to Flash at 0x0000, then jumps to the application.

 

 

USBasp (no bootloader)

Arduino IDE (with bootloader)

How it writes Flash

SPI/ICSP hardware

UART + bootloader software

Reset goes to

0x0000 always

0x7C00 (bootloader) first

Bootloader needed?

No

Yes

Can overwrite bootloader?

Yes (dangerous)

No (lock bits protect it)

 

1.7 — Every Startup, Without Exception

Once a bootloader is installed and BOOTRST is set, every single startup follows this flow:

 

POWER ON or RESET

       

       

PC = 0x7C00  (BOOTRST forces this)

       

       

BOOTLOADER RUNS

   1. Initialize UART

   2. Wait ~1 second for Avrdude

       

   ┌────┴────┐

           

Avrdude   No response (timeout)

connects     

          Jump to 0x0000

            

Receive    YOUR SKETCH RUNS

& flash

sketch

  

Reset → bootloader → timeout → 0x0000

 

1.8 — Chapter 1 Summary

 

Concept

Key Fact

Program Counter

Holds address of next instruction, starts at 0x0000 or 0x7C00

Flash layout

Application at bottom (0x0000), bootloader at top (0x7C00)

BOOTRST fuse

0 = programmed = CPU starts at boot section on reset

AVR fuse logic

0 = active, 1 = inactive (inverted — always remember this!)

USBasp upload

SPI directly to hardware, no bootloader needed, writes from 0x0000

Arduino upload

UART to bootloader, bootloader writes sketch to 0x0000

Every startup

If BOOTRST set → bootloader always runs first, then jumps to app

Bootloader's job

Check for new upload → if none, hand control to application

Chapter 2: Bootloader Section & Fuses

 

We only care about 3 things from the fuse system:

 

1. BOOTRST  → Tell CPU to start at bootloader on reset

2. BOOTSZ   → Tell CPU how big our bootloader is (sets start address)

3. Lock bits → Protect bootloader from being overwritten

 

2.1 — BOOTSZ: Choosing Our Bootloader Size

📖 Datasheet §27.5, Table 27-5 — Boot section sizes and start addresses

 

BOOTSZ1

BOOTSZ0

Size (words)

Size (bytes)

Start Address (byte)

1

1

256 words

512 bytes

0x7E00

1

0

512 words

1024 bytes

0x7C00  ← we use this

0

1

1024 words

2048 bytes

0x7800

0

0

2048 words

4096 bytes

0x7000

 

We pick 512 words (1024 bytes) starting at 0x7C00 — same as Optiboot. 256 bytes is too small, 512 bytes is the sweet spot, 1024+ wastes application space.

 

2.2 — The Fuse Bytes (Only What We Touch)

The ATmega328P has 3 fuse bytes. We only touch the High Fuse Byte (HFUSE). Our HFUSE value is 0xDA — this sets BOOTRST=0 (active) and BOOTSZ=10 (512 words).

 

⚠️ Critical: SPIEN (bit 5) must stay 0 (programmed/active). It enables SPI programming. If you accidentally set it to 1, you can no longer program the chip with USBasp!

 

📖 Datasheet §27.4, Table 27-3 — High Fuse Byte bit description

 

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

 

2.3 — Lock Bits: Protecting Our Bootloader

📖 Datasheet §27.6, Table 27-7 — Boot Lock Bit table

 

BLB12

BLB11

Effect

1

1

No restrictions (default — dangerous!)

1

0

Application cannot WRITE to boot section  ← we use this

0

1

Application cannot READ from boot section

0

0

Application cannot READ or WRITE boot section

 

Our Lock byte value is 0xEF. Set lock bits LAST — after the bootloader is flashed and working. Lock bits can only be cleared by a full chip erase which would wipe your bootloader.

 

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

 

2.4 — The Complete Fuse Setup

# 1. Set fuses (BOOTRST active, BOOTSZ = 512 words)

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

 

# 2. Flash our bootloader binary

avrdude -c usbasp -p m328p -U flash:w:bootloader.hex

 

# 3. Set lock bits LAST (protect bootloader section)

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

 

2.5 — Chapter 2 Summary

 

Thing

Value

Why

Bootloader size

512 words / 1024 bytes

Small enough, fits our code

Bootloader start

0x7C00

Calculated from BOOTSZ

HFUSE

0xDA

BOOTRST=0, BOOTSZ=10

Lock byte

0xEF

App cannot overwrite bootloader

Set fuses

First

With USBasp before anything

Set lock bits

Last

After bootloader is flashed and verified

Chapter 3: Toolchain Setup (Windows)

 

3.1 — Tools Already Installed via Arduino IDE

Since Arduino IDE 1.8.x is installed, you already have everything needed. No downloads required.

 

C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\

    tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin\

        avr-gcc.exe        ← the compiler

        avr-objcopy.exe    ← converts compiled output to .hex

        avr-size.exe       ← shows how big our binary is

 

⚠️ Note: Replace <username> with your actual Windows username everywhere in the scripts below.

 

3.2 — Our Project Folder

C:\AVR_Bootloader\

├── src\

   └── main.c          ← our bootloader source code

├── build\              ← compiled output goes here

├── build.bat           ← our build script

└── flash.bat           ← our flash script

 

3.3 — The Build Batch File (build.bat)

Create build.bat in C:\AVR_Bootloader\ with this content:

 

@echo off

REM ATmega328P Bootloader Build Script

 

set AVR=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin

set SRC=src\main.c

set BUILD=build

set OUT=bootloader

set MCU=atmega328p

set F_CPU=16000000UL

set BOOT_ADDR=0x7C00

 

echo [1/4] Compiling...

%AVR%\avr-gcc.exe -mmcu=%MCU% -DF_CPU=%F_CPU% -Os -std=c99 ^

    -Wl,--section-start=.text=%BOOT_ADDR% ^

    -o %BUILD%\%OUT%.elf %SRC%

if errorlevel 1 goto error

 

echo [2/4] Creating .hex file...

%AVR%\avr-objcopy.exe -O ihex -R .eeprom %BUILD%\%OUT%.elf %BUILD%\%OUT%.hex

if errorlevel 1 goto error

 

echo [3/4] Checking binary size...

%AVR%\avr-size.exe --format=avr --mcu=%MCU% %BUILD%\%OUT%.elf

 

echo [4/4] Done!

echo Output: %BUILD%\%OUT%.hex

goto end

 

:error

echo BUILD FAILED!

:end

pause

 

3.4 — The Most Important Line Explained

-Wl,--section-start=.text=%BOOT_ADDR%

 

-Wl,           → pass this flag to the linker

--section-start → place this section at this address

.text          → where compiled code lives

=%BOOT_ADDR%   → = 0x7C00 (our bootloader start address)

 

3.5 — Other Compiler Flags Explained

 

Flag

Purpose

-mmcu=atmega328p

Tells compiler exactly which AVR chip — sets correct register addresses

-DF_CPU=16000000UL

Defines F_CPU as 16MHz — used in Chapter 4 for baud rate calculation

-Os

Optimize for SIZE not speed — bootloader must fit in 1024 bytes!

-std=c99

Use C99 standard — allows cleaner code style

 

3.6 — Flash Script (flash.bat)

@echo off

REM ATmega328P Bootloader Flash Script

 

set AVRDUDE=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\bin\avrdude.exe

set AVRDUDE_CONF=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\etc\avrdude.conf

set HEX=build\bootloader.hex

set MCU=atmega328p

set PROGRAMMER=usbasp

 

echo [1/3] Setting fuses...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U hfuse:w:0xDA:m

if errorlevel 1 goto error

 

echo [2/3] Flashing bootloader...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U flash:w:%HEX%:i

if errorlevel 1 goto error

 

echo [3/3] Setting lock bits...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U lock:w:0xEF:m

if errorlevel 1 goto error

 

echo All done! Bootloader installed successfully.

goto end

:error

echo FAILED! Check USBasp connection.

:end

pause

 

3.7 — Chapter 3 Summary

 

File

Purpose

build.bat

Compiles src\main.c → build\bootloader.hex

flash.bat

Sets fuses, flashes hex, sets lock bits via USBasp

Chapter 4: UART From Scratch

 

4.1 — What is UART?

UART stands for Universal Asynchronous Receiver Transmitter. It is the simplest way two devices can talk — just two wires: TX (transmit) and RX (receive). Asynchronous means there is no shared clock wire. Both sides must agree on the speed beforehand — called the Baud Rate, measured in bits per second.

 

📖 Datasheet §19.1 — "The Universal Synchronous and Asynchronous serial Receiver and Transmitter (USART) is a highly flexible serial communication device"

 

4.2 — How UART Sends a Byte

When idle, the TX line sits HIGH. To send one byte (8 bits):

 

Frame = 1 start bit + 8 data bits + 1 stop bit = 10 bits total

 

Idle  Start D0  D1  D2  D3  D4  D5  D6  D7  Stop  Idle

────┐  ┌──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌────────

        │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │

    └──┘  └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘

 

At 115200 baud:

  1 bit  = 1 / 115200 = 8.68 microseconds

  1 byte = 8.68 × 10  = 86.8 microseconds

 

4.3 — The 3 Registers We Need

 

Register

Purpose

UBRR0

Sets the baud rate

UCSR0B

Enables transmitter and receiver

UCSR0C

Sets frame format (8 data bits, 1 stop bit)

UDR0

Write here to send, Read here to receive

UCSR0A

Status flags — is TX done? is RX ready?

 

4.4 — UBRR0: Setting the Baud Rate

📖 Datasheet §19.10 — USART Baud Rate Registers

 

The formula to calculate UBRR:

 

        F_CPU

UBRR = ────────── - 1

        16 × BAUD

 

For 16MHz CPU, 115200 baud:

        16,000,000

UBRR = ──────────── - 1  =  8.68 - 1  =  7.68    8 (rounded)

        16 × 115200

 

📖 Datasheet §19.11, Table 19-9 — Confirms UBRR=8 for 16MHz / 115200 baud. Actual baud rate error is 3.5% which Optiboot also uses and works reliably in practice.

 

#define F_CPU   16000000UL

#define BAUD    115200

#define UBRR    (F_CPU / (16UL * BAUD)) - 1   // = 8

 

UBRR0H = (UBRR >> 8);   // high byte first

UBRR0L =  UBRR;         // low byte

 

4.5 — UCSR0B: Enabling TX and RX

📖 Datasheet §19.9.3 — UCSR0B description. We only need RXEN0 (bit 4) to enable receiver and TXEN0 (bit 3) to enable transmitter.

 

UCSR0B = (1 << RXEN0) | (1 << TXEN0);

 

4.6 — UCSR0C: Frame Format

We want 8N1 — 8 data bits, No parity, 1 stop bit. This is standard for all AVR bootloaders.

 

📖 Datasheet §19.9.4 — UCSR0C description. UCSZ01:00 = 11 sets 8 data bits. UPM = 00 = no parity. USBS = 0 = 1 stop bit.

 

UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

 

4.7 — Putting It Together: uart_init()

void uart_init(void)

{

    UBRR0H = 0;   // high byte of 8

    UBRR0L = 8;   // low byte  of 8

    UCSR0B = (1 << RXEN0) | (1 << TXEN0);

    UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

 

4.8 — UCSR0A: Status Flags

📖 Datasheet §19.9.1 — UCSR0A description. UDRE0: TX buffer empty — safe to send next byte. RXC0: New byte waiting to be read.

 

4.9 — uart_send() and uart_receive()

void uart_send(uint8_t byte)

{

    while (!(UCSR0A & (1 << UDRE0)));  // wait until TX buffer empty

    UDR0 = byte;                        // hardware sends it automatically

}

 

uint8_t uart_receive(void)

{

    while (!(UCSR0A & (1 << RXC0)));   // wait until byte arrives

    return UDR0;                         // read and return byte

}

 

4.10 — Chapter 4 Summary

 

Register

What We Set

Why

UBRR0H:L

8

115200 baud at 16MHz

UCSR0B

RXEN0 | TXEN0

Enable RX and TX hardware

UCSR0C

UCSZ01 | UCSZ00

8N1 frame format

UCSR0A

Read only

Check UDRE0 before send, RXC0 before receive

UDR0

Write to send, Read to receive

The actual data register

 


 

Chapter 5: The STK500v1 Protocol

 

5.1 — What is STK500v1?

When you hit Upload in Arduino IDE, Avrdude runs and starts sending bytes over UART to our bootloader following a protocol called STK500v1. Every exchange follows the same structure:

 

Every COMMAND Avrdude sends:

┌──────────┬──────────────────────┬──────────┐

  CMD       PARAMETERS            SYNC   

  1 byte    0 or more bytes       0x20   

└──────────┴──────────────────────┴──────────┘

                                    CRC_EOP

 

Every RESPONSE we send:

┌──────────┬──────────────────────┬──────────┐

  0x14      DATA (if any)         0x10   

└──────────┴──────────────────────┴──────────┘

  INSYNC                            OK

 

5.2 — Protocol Constants

#define STK_OK              0x10

#define STK_INSYNC          0x14

#define STK_NOSYNC          0x15

#define STK_CRC_EOP         0x20

#define STK_GET_SYNC        0x30

#define STK_GET_PARAMETER   0x41

#define STK_SET_DEVICE      0x42

#define STK_SET_DEVICE_EXT  0x45

#define STK_ENTER_PROGMODE  0x50

#define STK_LEAVE_PROGMODE  0x51

#define STK_LOAD_ADDRESS    0x55

#define STK_PROG_PAGE       0x64

#define STK_READ_PAGE       0x74

#define STK_READ_SIGN       0x75

 

5.3 — Commands We Handle

 

Command

Parameters

What We Do

GET_SYNC (0x30)

None

Reply INSYNC + OK

GET_PARAMETER (0x41)

1 byte param ID

Reply version number

SET_DEVICE (0x42)

20 bytes

Drain bytes, reply OK

SET_DEVICE_EXT (0x45)

5 bytes

Drain bytes, reply OK

ENTER_PROGMODE (0x50)

None

Reply OK

LOAD_ADDRESS (0x55)

2 byte address

Store address, reply OK

PROG_PAGE (0x64)

Length + data

Write to Flash (Ch6), reply OK

READ_PAGE (0x74)

Length

Send Flash bytes back

READ_SIGN (0x75)

None

Send 0x1E 0x95 0x0F

LEAVE_PROGMODE (0x51)

None

Reply OK, jump to app (Ch7)

 

5.4 — Key Command Details

STK_LOAD_ADDRESS — Set Write Address

case STK_LOAD_ADDRESS:

{

    uint16_t lo = uart_receive();

    uint16_t hi = uart_receive();

    address = (hi << 8) | lo;

    /* address is a WORD address — multiply by 2 for SPM */

    get_sync();

    uart_send(STK_INSYNC);

    uart_send(STK_OK);

    break;

}

 

STK_PROG_PAGE — Write Flash

case STK_PROG_PAGE:

{

    uint16_t len  = ((uint16_t)uart_receive() << 8) | uart_receive();

    uint8_t  type = uart_receive();  // 'F' = Flash

    for (uint16_t i = 0; i < len; i++)

        page_buffer[i] = uart_receive();

    get_sync();

    if (type == 'F')

        write_flash_page(address, page_buffer, len);  // Chapter 6

    uart_send(STK_INSYNC);

    uart_send(STK_OK);

    break;

}

 

STK_READ_SIGN — Chip Signature

📖 Datasheet §27.8.1 — "The ATmega328P has a three byte signature code: 0x1E, 0x95, 0x0F"

 

case STK_READ_SIGN:

    get_sync();

    uart_send(STK_INSYNC);

    uart_send(0x1E);   // Atmel/Microchip

    uart_send(0x95);   // 32KB Flash

    uart_send(0x0F);   // ATmega328P

    uart_send(STK_OK);

    break;

 

5.5 — The Full Upload Sequence

Avrdude                          Our Bootloader

   │── GET_SYNC (×several) ────────────►│

   │◄── INSYNC + OK ───────────────────│

   │── GET_PARAMETER (SW major/minor) ─►│

   │◄── INSYNC + version + OK ─────────│

   │── SET_DEVICE (20 bytes) ──────────►│ (ignored)

   │── SET_DEVICE_EXT (5 bytes) ────────►│ (ignored)

   │── ENTER_PROGMODE ────────────────►│

   │── READ_SIGN ─────────────────────►│

   │◄── INSYNC + 1E 95 0F + OK ────────│

   │── LOAD_ADDRESS (page 0) ──────────►│ address = 0x0000

   │── PROG_PAGE (128 bytes) ──────────►│ write page to flash

   │── READ_PAGE (verify) ─────────────►│ read back and send

      ... repeat for every page ...   

   │── LEAVE_PROGMODE ────────────────►│

   │◄── INSYNC + OK ───────────────────│

                                  jump to 0x0000

Chapter 6: Flash Self-Programming

 

6.1 — What is Flash Self-Programming?

Normal code reads from Flash. Our bootloader needs to write to Flash — writing the incoming sketch data into the application section. This is called Self-Programming — the chip modifying its own Flash while running.

 

📖 Datasheet §27.1 — "The Boot program can use any available data interface and associated protocol to read code and write (program) that code into the Flash memory"

 

6.2 — The Most Important Constraint: Pages

You cannot write individual bytes to Flash. Flash is organized into fixed blocks called pages. You must write a whole page at a time.

 

📖 Datasheet §27.5 — "The Flash is organized in pages. When programming the Flash, the program data must be written one page at a time"

 

Flash page size = 64 WORDS = 128 BYTES

Total pages     = 256 pages

256 pages × 128 bytes = 32,768 bytes = 32KB

 

📖 Datasheet §27.5, Table 27-5 — "Page Size: 64 words / 128 bytes"

 

6.3 — The 3 Step Writing Process

Step 1 — ERASE the page

         Flash bits can only go 1→0 when writing.

         Erase resets all bits back to 1 (0xFF).

         Must erase before writing new data.

 

Step 2 — FILL the page buffer

         Load your 128 bytes into a temporary hardware

         buffer inside the chip (word by word — 2 bytes at a time).

         NOT written to Flash yet.

 

Step 3 — WRITE the page buffer to Flash

         Hardware copies buffer into actual Flash page.

         Takes ~3.7ms — CPU stalls during this.

 

⚠️ Critical: If you skip the erase step, bits that were 0 stay 0. Your new data gets corrupted. Always erase first.

 

📖 Datasheet §27.3 — Page Erase, Fill Temporary Buffer, Write Page from Temporary Buffer

 

6.4 — The SPM Instruction and boot.h

SPM (Store Program Memory) is a special AVR assembly instruction. We use <avr/boot.h> which wraps it in clean C macros:

 

#include <avr/boot.h>

 

boot_page_erase(byte_address);    // Erase one page

boot_spm_busy_wait();             // Wait for operation to complete

boot_page_fill(byte_address, w);  // Fill one word (2 bytes) into buffer

boot_page_write(byte_address);    // Write page buffer to Flash

boot_rww_enable();                // Re-enable app section reading

 

6.5 — RWW vs NRWW Sections

📖 Datasheet §27.1 — "The Flash memory is organized in two sections: Read-While-Write (RWW) and No Read-While-Write (NRWW)"

 

The application section (0x0000-0x7BFF) is RWW — while writing here, our bootloader code can still execute. The bootloader section (0x7C00-0x7FFF) is NRWW. After writing any RWW page, we must call boot_rww_enable() to re-enable reading from the application section.

 

6.6 — write_flash_page(): The Full Function

#define PAGE_SIZE 128

 

void write_flash_page(uint16_t word_addr,

                      uint8_t  *data,

                      uint16_t  length)

{

    /* Safety — never write into bootloader section */

    if (word_addr >= (BOOT_START / 2))

        return;

 

    /* Convert word address to byte address for SPM */

    uint32_t byte_addr = (uint32_t)word_addr * 2;

 

    /* Step 1 — Erase the page (~3.7ms) */

    boot_page_erase(byte_addr);

    boot_spm_busy_wait();

 

    /* Step 2 — Fill page buffer word by word */

    for (uint16_t i = 0; i < length; i += 2)

    {

        uint16_t word = data[i] | ((uint16_t)data[i + 1] << 8);

        boot_page_fill(byte_addr + i, word);

    }

 

    /* Step 3 — Write page buffer to Flash (~3.7ms) */

    boot_page_write(byte_addr);

    boot_spm_busy_wait();

 

    /* Re-enable RWW section for reading */

    boot_rww_enable();

}

 

6.7 — Timing

📖 Datasheet §27.8.1, Table 27-14 — Page Erase: 3.7ms. Page Write: 3.7ms. Total per page: ~7.4ms. Worst case full 32KB sketch: 256 pages × 7.4ms ≈ 1.9 seconds.

 

6.8 — Chapter 6 Summary

 

Concept

Key Point

Page size

128 bytes — must write whole pages

3 steps

Erase → Fill buffer → Write

SPM

Hardware instruction for Flash writing

<avr/boot.h>

Clean C macros wrapping SPM

Timing

~7.4ms per page (erase + write)

Word vs byte

Multiply word address × 2 for SPM

RWW

Must call boot_rww_enable() after every write

Guard check

Never write above 0x7C00

Chapter 7: Jumping to the App & Watchdog Timer

 

7.1 — What is the Watchdog Timer?

The Watchdog Timer (WDT) is a completely independent hardware timer built into the ATmega328P. It runs on its own internal oscillator — separate from your main clock. Think of it like a dead man's switch: your code must periodically kick the watchdog to prove it is still alive. If it does not kick it in time — the watchdog resets the CPU. No exceptions.

 

📖 Datasheet §10.1 — "The Watchdog Timer is clocked from a separate on-chip oscillator which runs at 128kHz"

 

7.2 — Why Its Own Oscillator?

The Watchdog runs on its own 128kHz oscillator completely independent from the main 16MHz system clock. Even if your code completely locks up in an infinite loop, crashes into invalid memory, or the main oscillator glitches — the watchdog timer keeps counting and resets the chip.

 

7.3 — What Can the Watchdog Do?

📖 Datasheet §10.2 — Watchdog Timer modes

 

Mode

What Happens When Timer Expires

Reset mode  ← we use

Chip resets immediately. PC goes to 0x0000 or 0x7C00

Interrupt mode

Fires an interrupt. Your ISR handles it. Code keeps running.

Both

First fires interrupt. If not cleared → then resets.

 

7.4 — Watchdog Timeout Periods

📖 Datasheet §10.3, Table 10-2 — Watchdog Timer prescale select

 

WDP bits

Timeout

Use Case

000

16 ms

Very tight safety loop

101

500 ms

 

110

1 sec

← we use this (upload window)

111

2 sec

Most common safety timeout

 

7.5 — Why We Need It In Our Bootloader

Without a timeout, our uart_receive() waits forever. If nobody connects, the bootloader is stuck and your sketch never runs. The watchdog timer gives us a 1-second window: if Avrdude connects, we disable the watchdog and proceed with upload. If nobody connects, the watchdog fires, chip resets, we detect WDRF and jump to the app immediately.

 

7.6 — Detecting a Watchdog Reset (MCUSR)

📖 Datasheet §8.4 — MCUSR register — tells us WHY the chip reset. WDRF (bit 3) is set when watchdog caused the reset.

 

/* Very first thing in main() */

uint8_t mcusr = MCUSR;    // save reset reason

MCUSR = 0;                // clear all flags

 

if (mcusr & (1 << WDRF))

{

    /* Watchdog reset → skip to app */

    jump_to_app();

}

 

7.7 — Why We Cannot Just goto 0x0000

Problem 1: The watchdog keeps running. The app never pets it, watchdog fires after 1 second, chip resets, bootloader runs again — infinite loop.

Problem 2: Dirty hardware state. The bootloader has initialized UART and modified hardware state. The app inherits all that instead of a clean reset state and may break.

 

7.8 — The Correct Way: Watchdog Reset Trick

Instead of jumping → use the watchdog to RESET the chip!

 

1. Set watchdog to shortest timeout (16ms)

2. Do nothing (don't pet it)

3. Watchdog fires after 16ms

4. Chip fully resets — clean hardware state!

5. Bootloader runs again

6. Sees WDRF flag in MCUSR → jump to 0x0000 immediately

7. App runs with perfectly clean hardware state ✅

 

📖 Datasheet §10.8 — Watchdog System Reset Mode

 

7.9 — The Code

#include <avr/wdt.h>

 

/* Enable 1 second watchdog */

void watchdog_enable_1s(void)

{

    wdt_enable(WDTO_1S);

}

 

/* Safe jump to application */

void jump_to_app(void)

{

    wdt_enable(WDTO_15MS);  // shortest timeout

    while(1);               // sit and wait for reset

    /* After ~16ms → RESET → clean hardware state */

    /* Bootloader restarts → sees WDRF → jumps to 0x0000 */

}

 

/* Disable watchdog once upload starts */

void watchdog_disable(void)

{

    wdt_reset();

    wdt_disable();

}

 

7.10 — Chapter 7 Summary

 

Concept

Key Point

Watchdog

Independent 128kHz hardware timer

Purpose

Resets chip if code hangs or crashes

Our use

1 second upload window timeout

MCUSR

Tells us WHY the chip reset

WDRF flag

Set when watchdog caused the reset

jump_to_app()

Set WDT to 16ms, loop, let it reset cleanly

Disable WDT

Call once Avrdude connects — upload takes time

Clean reset

Watchdog reset gives app a clean hardware state

Chapter 8: The Complete Bootloader

 

8.1 — The Complete main.c

Create this file at C:\AVR_Bootloader\src\main.c

 

/*

 * ATmega328P Bootloader

 * Compatible with Arduino IDE (STK500v1 / Avrdude)

 *

 * Application : 0x0000 - 0x7BFF (31,744 bytes)

 * Bootloader  : 0x7C00 - 0x7FFF (1,024  bytes)

 * HFUSE = 0xDA  LOCK = 0xEF  UART = 115200 8N1

 */

 

#include <avr/io.h>

#include <avr/boot.h>

#include <avr/pgmspace.h>

#include <avr/interrupt.h>

#include <avr/wdt.h>

#include <stdint.h>

 

#define BOOT_START          0x7C00

#define PAGE_SIZE           128

#define F_CPU               16000000UL

#define BAUD                115200UL

#define UBRR_VAL            ((F_CPU / (16UL * BAUD)) - 1)

 

#define STK_OK              0x10

#define STK_INSYNC          0x14

#define STK_NOSYNC          0x15

#define STK_CRC_EOP         0x20

#define STK_GET_SYNC        0x30

#define STK_GET_PARAMETER   0x41

#define STK_SET_DEVICE      0x42

#define STK_SET_DEVICE_EXT  0x45

#define STK_ENTER_PROGMODE  0x50

#define STK_LEAVE_PROGMODE  0x51

#define STK_LOAD_ADDRESS    0x55

#define STK_PROG_PAGE       0x64

#define STK_READ_PAGE       0x74

#define STK_READ_SIGN       0x75

#define SIGNATURE_0         0x1E

#define SIGNATURE_1         0x95

#define SIGNATURE_2         0x0F

 

static uint8_t  page_buffer[PAGE_SIZE];

static uint16_t address = 0;

 

/* ── UART ── */

void uart_init(void) {

    UBRR0H = (uint8_t)(UBRR_VAL >> 8);

    UBRR0L = (uint8_t)(UBRR_VAL);

    UCSR0B = (1 << RXEN0) | (1 << TXEN0);

    UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

void uart_send(uint8_t b) {

    while (!(UCSR0A & (1 << UDRE0)));

    UDR0 = b;

}

uint8_t uart_receive(void) {

    while (!(UCSR0A & (1 << RXC0)));

    return UDR0;

}

 

/* ── STK500v1 helper ── */

void get_sync(void) {

    uint8_t eop = uart_receive();

    if (eop != STK_CRC_EOP) uart_send(STK_NOSYNC);

}

 

/* ── Flash self-programming ── */

void write_flash_page(uint16_t word_addr, uint8_t *data, uint16_t len) {

    if (word_addr >= (BOOT_START / 2)) return;

    uint32_t byte_addr = (uint32_t)word_addr * 2;

    boot_page_erase(byte_addr);  boot_spm_busy_wait();

    for (uint16_t i = 0; i < len; i += 2) {

        uint16_t word = data[i] | ((uint16_t)data[i+1] << 8);

        boot_page_fill(byte_addr + i, word);

    }

    boot_page_write(byte_addr);  boot_spm_busy_wait();

    boot_rww_enable();

}

 

/* ── Jump to application ── */

void jump_to_app(void) {

    wdt_enable(WDTO_15MS);

    while(1);

}

 

/* ── Main ── */

int main(void) {

    uint8_t mcusr = MCUSR;

    MCUSR = 0;

    if (mcusr & (1 << WDRF)) {

        wdt_disable();

        ((void (*)(void))0)();   // jump to 0x0000

    }

    wdt_enable(WDTO_1S);

    uart_init();

    while (1) {

        uint8_t cmd = uart_receive();

        switch (cmd) {

            case STK_GET_SYNC:

                wdt_disable();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK);

                break;

            case STK_GET_PARAMETER: {

                uint8_t p = uart_receive(); get_sync();

                uart_send(STK_INSYNC);

                if      (p == 0x80) uart_send(0x02);

                else if (p == 0x81) uart_send(0x01);

                else                uart_send(0x00);

                uart_send(STK_OK); break; }

            case STK_SET_DEVICE:

                for (uint8_t i=0;i<20;i++) uart_receive();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_SET_DEVICE_EXT:

                for (uint8_t i=0;i<5;i++) uart_receive();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_ENTER_PROGMODE:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_LOAD_ADDRESS: {

                uint16_t lo = uart_receive();

                uint16_t hi = uart_receive();

                address = (hi << 8) | lo;

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break; }

            case STK_PROG_PAGE: {

                uint16_t len  = ((uint16_t)uart_receive()<<8)|uart_receive();

                uint8_t  type = uart_receive();

                for (uint16_t i=0;i<len;i++) page_buffer[i]=uart_receive();

                get_sync();

                if (type=='F') write_flash_page(address,page_buffer,len);

                uart_send(STK_INSYNC); uart_send(STK_OK); break; }

            case STK_READ_PAGE: {

                uint16_t len  = ((uint16_t)uart_receive()<<8)|uart_receive();

                uint8_t  type = uart_receive(); get_sync();

                uart_send(STK_INSYNC);

                if (type=='F')

                    for (uint16_t i=0;i<len;i++)

                        uart_send(pgm_read_byte((uint32_t)(address*2)+i));

                uart_send(STK_OK); break; }

            case STK_READ_SIGN:

                get_sync();

                uart_send(STK_INSYNC);

                uart_send(SIGNATURE_0);

                uart_send(SIGNATURE_1);

                uart_send(SIGNATURE_2);

                uart_send(STK_OK); break;

            case STK_LEAVE_PROGMODE:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK);

                jump_to_app(); break;

            default:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

        }

    }

    return 0;

}

 

8.2 — Build It

Open a command prompt in C:\AVR_Bootloader\ and run build.bat. The critical check: Program must be under 1024 bytes. That is our bootloader section size.

 

8.3 — Verify the .hex File

Open build\bootloader.hex in Notepad. The first line should show address 7C00:

 

:10 7C00 00 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx

      

       └── 7C00 = our bootloader start address ✅

 

If you see 0000 here — the linker flag in build.bat is not working.

 

8.4 — Flash It

Connect your USBasp and run flash.bat. It will set fuses, flash the bootloader, then set lock bits in that order.

 

8.5 — Test It

Test 1 — Timeout Works

Power on the board with no USB-Serial connected. Wait 2-3 seconds. Your existing sketch should run. This confirms the 1-second watchdog timeout and jump-to-app are working correctly.

 

Test 2 — Arduino IDE Upload Works

Open Arduino IDE, select Arduino Uno board, select the correct COM port, open the Blink sketch, and hit Upload. You should see bytes written and verified, then the LED starts blinking immediately.

 

8.6 — Complete Project Structure

C:\AVR_Bootloader\

├── src\

   └── main.c          ← the bootloader source

├── build\

   ├── bootloader.elf  ← compiled binary (intermediate)

   └── bootloader.hex  ← final hex file (flashed to chip)

├── build.bat           ← compiles main.c → bootloader.hex

└── flash.bat           ← sets fuses, flashes hex, sets lock bits

 

8.7 — Complete Tutorial Summary

 

Chapter

What We Built

1 — How the chip boots

Memory map, BOOTRST fuse, two upload scenarios, startup flow

2 — Fuses & memory

BOOTSZ=512 words, start=0x7C00, HFUSE=0xDA, lock bits=0xEF

3 — Toolchain

Build and flash batch scripts for Windows using Arduino IDE tools

4 — UART

uart_init, uart_send, uart_receive — 3 registers, baud rate math

5 — STK500v1 protocol

10 commands, handshake, address loading, data transfer

6 — Flash self-programming

Erase, fill, write, 128 byte pages, boot.h macros

7 — Watchdog & safe jump

1 second window, WDRF detection, clean app handoff

8 — Complete bootloader

Everything assembled, built, flashed and tested

✅ Tutorial Complete!

You now have a fully working Optiboot-style bootloader written from scratch, with every line of code tied back to the ATmega328P datasheet.