Saturday, May 16, 2026

ATmega328P Bootloader

A Step-by-Step Tutorial

Writing an Optiboot-style UART Bootloader from Scratch

For Windows | Arduino IDE 1.8.x | ATmega328P @ 16MHz


Chapter 1: How the ATmega328P Boots

 

1.1 — What is the ATmega328P at its Core?

The ATmega328P is an 8-bit microcontroller made by Microchip (formerly Atmel). It contains three separate memory types:

 

Memory

Size

Purpose

Survives Power Off?

Flash

32KB

Stores your program code

Yes

SRAM

2KB

Stores variables while running

No

EEPROM

1KB

Stores small persistent data

Yes

 

We only care about Flash for our bootloader. SRAM and EEPROM are not involved in booting.

 

📖 Datasheet §1 — "The ATmega328P is a low-power CMOS 8-bit microcontroller based on the AVR enhanced RISC architecture"

 

1.2 — The Program Counter (PC)

The CPU has a special internal register called the Program Counter (PC). It holds the address of the next instruction to execute. The CPU is a machine that does this millions of times per second:

 

loop forever:

    1. Read instruction at address stored in PC

    2. Execute that instruction

    3. Increment PC (or jump if instruction says so)

 

📖 Datasheet §6.3 — "The Program Counter (PC) is 14 bits wide, addressing the 16K word (32KB) program memory space"

 

⚠️ Note: PC counts in WORDS (2 bytes each). 2^14 = 16,384 words = 32,768 bytes = 32KB. You will see both word and byte addresses in the datasheet.

 

1.3 — The Flash Memory Map in Detail

The full 32KB Flash with real addresses:

 

BYTE ADDRESS    CONTENT

┌────────────────────────────────────────────┐

│ 0x0000    ← RESET VECTOR                  

│ 0x0002    ← INT0 vector                   

│ 0x0004    ← INT1 vector                   

│ ...          other interrupt vectors       

├────────────────────────────────────────────┤

                                           

         APPLICATION SECTION              

         (your sketch / program)          

                                           

├────────────────────────────────────────────┤

│ 0x7C00   ← BOOT START (BOOTSZ=512 words) 

         BOOTLOADER SECTION               

         (our code lives here)            

│ 0x7FFF   ← TOP OF FLASH                 

└────────────────────────────────────────────┘

         Total: 32,768 bytes (32KB)

 

📖 Datasheet §27.5, Table 27-5 — Defines boot section sizes and start addresses

 

1.4 — What Happens in the First Nanoseconds After Reset?

Things that cause a Reset: Power on (VCC rises), RESET pin pulled low, Watchdog Timer timeout, Brown-out detection, Software reset via watchdog.

 

📖 Datasheet §8.1 — "The ATmega328P provides several reset sources"

 

Regardless of reset source, the sequence is always:

 

Reset occurs

    

    

CPU initializes internally (registers cleared)

    

    

┌─────────────────────────────────────────────┐

   CPU checks: Is BOOTRST fuse programmed?  

└─────────────────────────────────────────────┘

                              

         NO                    YES

                               

                              

   PC = 0x0000           PC = 0x7C00

   Your app runs         Bootloader runs

 

📖 Datasheet §27.4 — "If the BOOTRST Fuse is programmed, the reset vector is pointing to the Boot Flash start address"

 

1.5 — The BOOTRST Fuse Bit

Fuse bits are configuration bits stored outside Flash in dedicated hardware. They survive power cycling and are written with a programmer like USBasp.

 

⚠️ AVR Fuse Convention: Bit = 1 means UNPROGRAMMED (not active, factory default). Bit = 0 means PROGRAMMED (active). So BOOTRST = 0 means bootloader IS active. This inverted logic confuses everyone at first!

 

1.6 — The Two Upload Scenarios

Scenario A: USBasp Without a Bootloader

The USBasp talks SPI directly to the chip hardware (ICSP header). It writes raw bytes to Flash starting at 0x0000. No bootloader is needed or involved. Even a completely blank chip can be programmed this way. BOOTRST fuse is NOT set, so CPU starts at 0x0000 and your code runs immediately.

 

Scenario B: Arduino IDE Upload (With Bootloader)

The Arduino IDE runs Avrdude which talks to the bootloader running on the chip over UART. A DTR pin pulse triggers a hardware reset, the chip resets with BOOTRST set so PC goes to 0x7C00, the bootloader starts, waits for Avrdude, receives the sketch over UART, writes it to Flash at 0x0000, then jumps to the application.

 

 

USBasp (no bootloader)

Arduino IDE (with bootloader)

How it writes Flash

SPI/ICSP hardware

UART + bootloader software

Reset goes to

0x0000 always

0x7C00 (bootloader) first

Bootloader needed?

No

Yes

Can overwrite bootloader?

Yes (dangerous)

No (lock bits protect it)

 

1.7 — Every Startup, Without Exception

Once a bootloader is installed and BOOTRST is set, every single startup follows this flow:

 

POWER ON or RESET

       

       

PC = 0x7C00  (BOOTRST forces this)

       

       

BOOTLOADER RUNS

   1. Initialize UART

   2. Wait ~1 second for Avrdude

       

   ┌────┴────┐

           

Avrdude   No response (timeout)

connects     

          Jump to 0x0000

            

Receive    YOUR SKETCH RUNS

& flash

sketch

  

Reset → bootloader → timeout → 0x0000

 

1.8 — Chapter 1 Summary

 

Concept

Key Fact

Program Counter

Holds address of next instruction, starts at 0x0000 or 0x7C00

Flash layout

Application at bottom (0x0000), bootloader at top (0x7C00)

BOOTRST fuse

0 = programmed = CPU starts at boot section on reset

AVR fuse logic

0 = active, 1 = inactive (inverted — always remember this!)

USBasp upload

SPI directly to hardware, no bootloader needed, writes from 0x0000

Arduino upload

UART to bootloader, bootloader writes sketch to 0x0000

Every startup

If BOOTRST set → bootloader always runs first, then jumps to app

Bootloader's job

Check for new upload → if none, hand control to application

Chapter 2: Bootloader Section & Fuses

 

We only care about 3 things from the fuse system:

 

1. BOOTRST  → Tell CPU to start at bootloader on reset

2. BOOTSZ   → Tell CPU how big our bootloader is (sets start address)

3. Lock bits → Protect bootloader from being overwritten

 

2.1 — BOOTSZ: Choosing Our Bootloader Size

📖 Datasheet §27.5, Table 27-5 — Boot section sizes and start addresses

 

BOOTSZ1

BOOTSZ0

Size (words)

Size (bytes)

Start Address (byte)

1

1

256 words

512 bytes

0x7E00

1

0

512 words

1024 bytes

0x7C00  ← we use this

0

1

1024 words

2048 bytes

0x7800

0

0

2048 words

4096 bytes

0x7000

 

We pick 512 words (1024 bytes) starting at 0x7C00 — same as Optiboot. 256 bytes is too small, 512 bytes is the sweet spot, 1024+ wastes application space.

 

2.2 — The Fuse Bytes (Only What We Touch)

The ATmega328P has 3 fuse bytes. We only touch the High Fuse Byte (HFUSE). Our HFUSE value is 0xDA — this sets BOOTRST=0 (active) and BOOTSZ=10 (512 words).

 

⚠️ Critical: SPIEN (bit 5) must stay 0 (programmed/active). It enables SPI programming. If you accidentally set it to 1, you can no longer program the chip with USBasp!

 

📖 Datasheet §27.4, Table 27-3 — High Fuse Byte bit description

 

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

 

2.3 — Lock Bits: Protecting Our Bootloader

📖 Datasheet §27.6, Table 27-7 — Boot Lock Bit table

 

BLB12

BLB11

Effect

1

1

No restrictions (default — dangerous!)

1

0

Application cannot WRITE to boot section  ← we use this

0

1

Application cannot READ from boot section

0

0

Application cannot READ or WRITE boot section

 

Our Lock byte value is 0xEF. Set lock bits LAST — after the bootloader is flashed and working. Lock bits can only be cleared by a full chip erase which would wipe your bootloader.

 

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

 

2.4 — The Complete Fuse Setup

# 1. Set fuses (BOOTRST active, BOOTSZ = 512 words)

avrdude -c usbasp -p m328p -U hfuse:w:0xDA:m

 

# 2. Flash our bootloader binary

avrdude -c usbasp -p m328p -U flash:w:bootloader.hex

 

# 3. Set lock bits LAST (protect bootloader section)

avrdude -c usbasp -p m328p -U lock:w:0xEF:m

 

2.5 — Chapter 2 Summary

 

Thing

Value

Why

Bootloader size

512 words / 1024 bytes

Small enough, fits our code

Bootloader start

0x7C00

Calculated from BOOTSZ

HFUSE

0xDA

BOOTRST=0, BOOTSZ=10

Lock byte

0xEF

App cannot overwrite bootloader

Set fuses

First

With USBasp before anything

Set lock bits

Last

After bootloader is flashed and verified

Chapter 3: Toolchain Setup (Windows)

 

3.1 — Tools Already Installed via Arduino IDE

Since Arduino IDE 1.8.x is installed, you already have everything needed. No downloads required.

 

C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\

    tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin\

        avr-gcc.exe        ← the compiler

        avr-objcopy.exe    ← converts compiled output to .hex

        avr-size.exe       ← shows how big our binary is

 

⚠️ Note: Replace <username> with your actual Windows username everywhere in the scripts below.

 

3.2 — Our Project Folder

C:\AVR_Bootloader\

├── src\

   └── main.c          ← our bootloader source code

├── build\              ← compiled output goes here

├── build.bat           ← our build script

└── flash.bat           ← our flash script

 

3.3 — The Build Batch File (build.bat)

Create build.bat in C:\AVR_Bootloader\ with this content:

 

@echo off

REM ATmega328P Bootloader Build Script

 

set AVR=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avr-gcc\5.4.0-atmel3.6.1-arduino2\bin

set SRC=src\main.c

set BUILD=build

set OUT=bootloader

set MCU=atmega328p

set F_CPU=16000000UL

set BOOT_ADDR=0x7C00

 

echo [1/4] Compiling...

%AVR%\avr-gcc.exe -mmcu=%MCU% -DF_CPU=%F_CPU% -Os -std=c99 ^

    -Wl,--section-start=.text=%BOOT_ADDR% ^

    -o %BUILD%\%OUT%.elf %SRC%

if errorlevel 1 goto error

 

echo [2/4] Creating .hex file...

%AVR%\avr-objcopy.exe -O ihex -R .eeprom %BUILD%\%OUT%.elf %BUILD%\%OUT%.hex

if errorlevel 1 goto error

 

echo [3/4] Checking binary size...

%AVR%\avr-size.exe --format=avr --mcu=%MCU% %BUILD%\%OUT%.elf

 

echo [4/4] Done!

echo Output: %BUILD%\%OUT%.hex

goto end

 

:error

echo BUILD FAILED!

:end

pause

 

3.4 — The Most Important Line Explained

-Wl,--section-start=.text=%BOOT_ADDR%

 

-Wl,           → pass this flag to the linker

--section-start → place this section at this address

.text          → where compiled code lives

=%BOOT_ADDR%   → = 0x7C00 (our bootloader start address)

 

3.5 — Other Compiler Flags Explained

 

Flag

Purpose

-mmcu=atmega328p

Tells compiler exactly which AVR chip — sets correct register addresses

-DF_CPU=16000000UL

Defines F_CPU as 16MHz — used in Chapter 4 for baud rate calculation

-Os

Optimize for SIZE not speed — bootloader must fit in 1024 bytes!

-std=c99

Use C99 standard — allows cleaner code style

 

3.6 — Flash Script (flash.bat)

@echo off

REM ATmega328P Bootloader Flash Script

 

set AVRDUDE=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\bin\avrdude.exe

set AVRDUDE_CONF=C:\Users\<username>\AppData\Local\Arduino15\packages\arduino\tools\avrdude\6.3.0-arduino17\etc\avrdude.conf

set HEX=build\bootloader.hex

set MCU=atmega328p

set PROGRAMMER=usbasp

 

echo [1/3] Setting fuses...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U hfuse:w:0xDA:m

if errorlevel 1 goto error

 

echo [2/3] Flashing bootloader...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U flash:w:%HEX%:i

if errorlevel 1 goto error

 

echo [3/3] Setting lock bits...

%AVRDUDE% -C %AVRDUDE_CONF% -p %MCU% -c %PROGRAMMER% -U lock:w:0xEF:m

if errorlevel 1 goto error

 

echo All done! Bootloader installed successfully.

goto end

:error

echo FAILED! Check USBasp connection.

:end

pause

 

3.7 — Chapter 3 Summary

 

File

Purpose

build.bat

Compiles src\main.c → build\bootloader.hex

flash.bat

Sets fuses, flashes hex, sets lock bits via USBasp

Chapter 4: UART From Scratch

 

4.1 — What is UART?

UART stands for Universal Asynchronous Receiver Transmitter. It is the simplest way two devices can talk — just two wires: TX (transmit) and RX (receive). Asynchronous means there is no shared clock wire. Both sides must agree on the speed beforehand — called the Baud Rate, measured in bits per second.

 

📖 Datasheet §19.1 — "The Universal Synchronous and Asynchronous serial Receiver and Transmitter (USART) is a highly flexible serial communication device"

 

4.2 — How UART Sends a Byte

When idle, the TX line sits HIGH. To send one byte (8 bits):

 

Frame = 1 start bit + 8 data bits + 1 stop bit = 10 bits total

 

Idle  Start D0  D1  D2  D3  D4  D5  D6  D7  Stop  Idle

────┐  ┌──┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌────────

        │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │

    └──┘  └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘

 

At 115200 baud:

  1 bit  = 1 / 115200 = 8.68 microseconds

  1 byte = 8.68 × 10  = 86.8 microseconds

 

4.3 — The 3 Registers We Need

 

Register

Purpose

UBRR0

Sets the baud rate

UCSR0B

Enables transmitter and receiver

UCSR0C

Sets frame format (8 data bits, 1 stop bit)

UDR0

Write here to send, Read here to receive

UCSR0A

Status flags — is TX done? is RX ready?

 

4.4 — UBRR0: Setting the Baud Rate

📖 Datasheet §19.10 — USART Baud Rate Registers

 

The formula to calculate UBRR:

 

        F_CPU

UBRR = ────────── - 1

        16 × BAUD

 

For 16MHz CPU, 115200 baud:

        16,000,000

UBRR = ──────────── - 1  =  8.68 - 1  =  7.68    8 (rounded)

        16 × 115200

 

📖 Datasheet §19.11, Table 19-9 — Confirms UBRR=8 for 16MHz / 115200 baud. Actual baud rate error is 3.5% which Optiboot also uses and works reliably in practice.

 

#define F_CPU   16000000UL

#define BAUD    115200

#define UBRR    (F_CPU / (16UL * BAUD)) - 1   // = 8

 

UBRR0H = (UBRR >> 8);   // high byte first

UBRR0L =  UBRR;         // low byte

 

4.5 — UCSR0B: Enabling TX and RX

📖 Datasheet §19.9.3 — UCSR0B description. We only need RXEN0 (bit 4) to enable receiver and TXEN0 (bit 3) to enable transmitter.

 

UCSR0B = (1 << RXEN0) | (1 << TXEN0);

 

4.6 — UCSR0C: Frame Format

We want 8N1 — 8 data bits, No parity, 1 stop bit. This is standard for all AVR bootloaders.

 

📖 Datasheet §19.9.4 — UCSR0C description. UCSZ01:00 = 11 sets 8 data bits. UPM = 00 = no parity. USBS = 0 = 1 stop bit.

 

UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

 

4.7 — Putting It Together: uart_init()

void uart_init(void)

{

    UBRR0H = 0;   // high byte of 8

    UBRR0L = 8;   // low byte  of 8

    UCSR0B = (1 << RXEN0) | (1 << TXEN0);

    UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

 

4.8 — UCSR0A: Status Flags

📖 Datasheet §19.9.1 — UCSR0A description. UDRE0: TX buffer empty — safe to send next byte. RXC0: New byte waiting to be read.

 

4.9 — uart_send() and uart_receive()

void uart_send(uint8_t byte)

{

    while (!(UCSR0A & (1 << UDRE0)));  // wait until TX buffer empty

    UDR0 = byte;                        // hardware sends it automatically

}

 

uint8_t uart_receive(void)

{

    while (!(UCSR0A & (1 << RXC0)));   // wait until byte arrives

    return UDR0;                         // read and return byte

}

 

4.10 — Chapter 4 Summary

 

Register

What We Set

Why

UBRR0H:L

8

115200 baud at 16MHz

UCSR0B

RXEN0 | TXEN0

Enable RX and TX hardware

UCSR0C

UCSZ01 | UCSZ00

8N1 frame format

UCSR0A

Read only

Check UDRE0 before send, RXC0 before receive

UDR0

Write to send, Read to receive

The actual data register

 


 

Chapter 5: The STK500v1 Protocol

 

5.1 — What is STK500v1?

When you hit Upload in Arduino IDE, Avrdude runs and starts sending bytes over UART to our bootloader following a protocol called STK500v1. Every exchange follows the same structure:

 

Every COMMAND Avrdude sends:

┌──────────┬──────────────────────┬──────────┐

  CMD       PARAMETERS            SYNC   

  1 byte    0 or more bytes       0x20   

└──────────┴──────────────────────┴──────────┘

                                    CRC_EOP

 

Every RESPONSE we send:

┌──────────┬──────────────────────┬──────────┐

  0x14      DATA (if any)         0x10   

└──────────┴──────────────────────┴──────────┘

  INSYNC                            OK

 

5.2 — Protocol Constants

#define STK_OK              0x10

#define STK_INSYNC          0x14

#define STK_NOSYNC          0x15

#define STK_CRC_EOP         0x20

#define STK_GET_SYNC        0x30

#define STK_GET_PARAMETER   0x41

#define STK_SET_DEVICE      0x42

#define STK_SET_DEVICE_EXT  0x45

#define STK_ENTER_PROGMODE  0x50

#define STK_LEAVE_PROGMODE  0x51

#define STK_LOAD_ADDRESS    0x55

#define STK_PROG_PAGE       0x64

#define STK_READ_PAGE       0x74

#define STK_READ_SIGN       0x75

 

5.3 — Commands We Handle

 

Command

Parameters

What We Do

GET_SYNC (0x30)

None

Reply INSYNC + OK

GET_PARAMETER (0x41)

1 byte param ID

Reply version number

SET_DEVICE (0x42)

20 bytes

Drain bytes, reply OK

SET_DEVICE_EXT (0x45)

5 bytes

Drain bytes, reply OK

ENTER_PROGMODE (0x50)

None

Reply OK

LOAD_ADDRESS (0x55)

2 byte address

Store address, reply OK

PROG_PAGE (0x64)

Length + data

Write to Flash (Ch6), reply OK

READ_PAGE (0x74)

Length

Send Flash bytes back

READ_SIGN (0x75)

None

Send 0x1E 0x95 0x0F

LEAVE_PROGMODE (0x51)

None

Reply OK, jump to app (Ch7)

 

5.4 — Key Command Details

STK_LOAD_ADDRESS — Set Write Address

case STK_LOAD_ADDRESS:

{

    uint16_t lo = uart_receive();

    uint16_t hi = uart_receive();

    address = (hi << 8) | lo;

    /* address is a WORD address — multiply by 2 for SPM */

    get_sync();

    uart_send(STK_INSYNC);

    uart_send(STK_OK);

    break;

}

 

STK_PROG_PAGE — Write Flash

case STK_PROG_PAGE:

{

    uint16_t len  = ((uint16_t)uart_receive() << 8) | uart_receive();

    uint8_t  type = uart_receive();  // 'F' = Flash

    for (uint16_t i = 0; i < len; i++)

        page_buffer[i] = uart_receive();

    get_sync();

    if (type == 'F')

        write_flash_page(address, page_buffer, len);  // Chapter 6

    uart_send(STK_INSYNC);

    uart_send(STK_OK);

    break;

}

 

STK_READ_SIGN — Chip Signature

📖 Datasheet §27.8.1 — "The ATmega328P has a three byte signature code: 0x1E, 0x95, 0x0F"

 

case STK_READ_SIGN:

    get_sync();

    uart_send(STK_INSYNC);

    uart_send(0x1E);   // Atmel/Microchip

    uart_send(0x95);   // 32KB Flash

    uart_send(0x0F);   // ATmega328P

    uart_send(STK_OK);

    break;

 

5.5 — The Full Upload Sequence

Avrdude                          Our Bootloader

   │── GET_SYNC (×several) ────────────►│

   │◄── INSYNC + OK ───────────────────│

   │── GET_PARAMETER (SW major/minor) ─►│

   │◄── INSYNC + version + OK ─────────│

   │── SET_DEVICE (20 bytes) ──────────►│ (ignored)

   │── SET_DEVICE_EXT (5 bytes) ────────►│ (ignored)

   │── ENTER_PROGMODE ────────────────►│

   │── READ_SIGN ─────────────────────►│

   │◄── INSYNC + 1E 95 0F + OK ────────│

   │── LOAD_ADDRESS (page 0) ──────────►│ address = 0x0000

   │── PROG_PAGE (128 bytes) ──────────►│ write page to flash

   │── READ_PAGE (verify) ─────────────►│ read back and send

      ... repeat for every page ...   

   │── LEAVE_PROGMODE ────────────────►│

   │◄── INSYNC + OK ───────────────────│

                                  jump to 0x0000

Chapter 6: Flash Self-Programming

 

6.1 — What is Flash Self-Programming?

Normal code reads from Flash. Our bootloader needs to write to Flash — writing the incoming sketch data into the application section. This is called Self-Programming — the chip modifying its own Flash while running.

 

📖 Datasheet §27.1 — "The Boot program can use any available data interface and associated protocol to read code and write (program) that code into the Flash memory"

 

6.2 — The Most Important Constraint: Pages

You cannot write individual bytes to Flash. Flash is organized into fixed blocks called pages. You must write a whole page at a time.

 

📖 Datasheet §27.5 — "The Flash is organized in pages. When programming the Flash, the program data must be written one page at a time"

 

Flash page size = 64 WORDS = 128 BYTES

Total pages     = 256 pages

256 pages × 128 bytes = 32,768 bytes = 32KB

 

📖 Datasheet §27.5, Table 27-5 — "Page Size: 64 words / 128 bytes"

 

6.3 — The 3 Step Writing Process

Step 1 — ERASE the page

         Flash bits can only go 1→0 when writing.

         Erase resets all bits back to 1 (0xFF).

         Must erase before writing new data.

 

Step 2 — FILL the page buffer

         Load your 128 bytes into a temporary hardware

         buffer inside the chip (word by word — 2 bytes at a time).

         NOT written to Flash yet.

 

Step 3 — WRITE the page buffer to Flash

         Hardware copies buffer into actual Flash page.

         Takes ~3.7ms — CPU stalls during this.

 

⚠️ Critical: If you skip the erase step, bits that were 0 stay 0. Your new data gets corrupted. Always erase first.

 

📖 Datasheet §27.3 — Page Erase, Fill Temporary Buffer, Write Page from Temporary Buffer

 

6.4 — The SPM Instruction and boot.h

SPM (Store Program Memory) is a special AVR assembly instruction. We use <avr/boot.h> which wraps it in clean C macros:

 

#include <avr/boot.h>

 

boot_page_erase(byte_address);    // Erase one page

boot_spm_busy_wait();             // Wait for operation to complete

boot_page_fill(byte_address, w);  // Fill one word (2 bytes) into buffer

boot_page_write(byte_address);    // Write page buffer to Flash

boot_rww_enable();                // Re-enable app section reading

 

6.5 — RWW vs NRWW Sections

📖 Datasheet §27.1 — "The Flash memory is organized in two sections: Read-While-Write (RWW) and No Read-While-Write (NRWW)"

 

The application section (0x0000-0x7BFF) is RWW — while writing here, our bootloader code can still execute. The bootloader section (0x7C00-0x7FFF) is NRWW. After writing any RWW page, we must call boot_rww_enable() to re-enable reading from the application section.

 

6.6 — write_flash_page(): The Full Function

#define PAGE_SIZE 128

 

void write_flash_page(uint16_t word_addr,

                      uint8_t  *data,

                      uint16_t  length)

{

    /* Safety — never write into bootloader section */

    if (word_addr >= (BOOT_START / 2))

        return;

 

    /* Convert word address to byte address for SPM */

    uint32_t byte_addr = (uint32_t)word_addr * 2;

 

    /* Step 1 — Erase the page (~3.7ms) */

    boot_page_erase(byte_addr);

    boot_spm_busy_wait();

 

    /* Step 2 — Fill page buffer word by word */

    for (uint16_t i = 0; i < length; i += 2)

    {

        uint16_t word = data[i] | ((uint16_t)data[i + 1] << 8);

        boot_page_fill(byte_addr + i, word);

    }

 

    /* Step 3 — Write page buffer to Flash (~3.7ms) */

    boot_page_write(byte_addr);

    boot_spm_busy_wait();

 

    /* Re-enable RWW section for reading */

    boot_rww_enable();

}

 

6.7 — Timing

📖 Datasheet §27.8.1, Table 27-14 — Page Erase: 3.7ms. Page Write: 3.7ms. Total per page: ~7.4ms. Worst case full 32KB sketch: 256 pages × 7.4ms ≈ 1.9 seconds.

 

6.8 — Chapter 6 Summary

 

Concept

Key Point

Page size

128 bytes — must write whole pages

3 steps

Erase → Fill buffer → Write

SPM

Hardware instruction for Flash writing

<avr/boot.h>

Clean C macros wrapping SPM

Timing

~7.4ms per page (erase + write)

Word vs byte

Multiply word address × 2 for SPM

RWW

Must call boot_rww_enable() after every write

Guard check

Never write above 0x7C00

Chapter 7: Jumping to the App & Watchdog Timer

 

7.1 — What is the Watchdog Timer?

The Watchdog Timer (WDT) is a completely independent hardware timer built into the ATmega328P. It runs on its own internal oscillator — separate from your main clock. Think of it like a dead man's switch: your code must periodically kick the watchdog to prove it is still alive. If it does not kick it in time — the watchdog resets the CPU. No exceptions.

 

📖 Datasheet §10.1 — "The Watchdog Timer is clocked from a separate on-chip oscillator which runs at 128kHz"

 

7.2 — Why Its Own Oscillator?

The Watchdog runs on its own 128kHz oscillator completely independent from the main 16MHz system clock. Even if your code completely locks up in an infinite loop, crashes into invalid memory, or the main oscillator glitches — the watchdog timer keeps counting and resets the chip.

 

7.3 — What Can the Watchdog Do?

📖 Datasheet §10.2 — Watchdog Timer modes

 

Mode

What Happens When Timer Expires

Reset mode  ← we use

Chip resets immediately. PC goes to 0x0000 or 0x7C00

Interrupt mode

Fires an interrupt. Your ISR handles it. Code keeps running.

Both

First fires interrupt. If not cleared → then resets.

 

7.4 — Watchdog Timeout Periods

📖 Datasheet §10.3, Table 10-2 — Watchdog Timer prescale select

 

WDP bits

Timeout

Use Case

000

16 ms

Very tight safety loop

101

500 ms

 

110

1 sec

← we use this (upload window)

111

2 sec

Most common safety timeout

 

7.5 — Why We Need It In Our Bootloader

Without a timeout, our uart_receive() waits forever. If nobody connects, the bootloader is stuck and your sketch never runs. The watchdog timer gives us a 1-second window: if Avrdude connects, we disable the watchdog and proceed with upload. If nobody connects, the watchdog fires, chip resets, we detect WDRF and jump to the app immediately.

 

7.6 — Detecting a Watchdog Reset (MCUSR)

📖 Datasheet §8.4 — MCUSR register — tells us WHY the chip reset. WDRF (bit 3) is set when watchdog caused the reset.

 

/* Very first thing in main() */

uint8_t mcusr = MCUSR;    // save reset reason

MCUSR = 0;                // clear all flags

 

if (mcusr & (1 << WDRF))

{

    /* Watchdog reset → skip to app */

    jump_to_app();

}

 

7.7 — Why We Cannot Just goto 0x0000

Problem 1: The watchdog keeps running. The app never pets it, watchdog fires after 1 second, chip resets, bootloader runs again — infinite loop.

Problem 2: Dirty hardware state. The bootloader has initialized UART and modified hardware state. The app inherits all that instead of a clean reset state and may break.

 

7.8 — The Correct Way: Watchdog Reset Trick

Instead of jumping → use the watchdog to RESET the chip!

 

1. Set watchdog to shortest timeout (16ms)

2. Do nothing (don't pet it)

3. Watchdog fires after 16ms

4. Chip fully resets — clean hardware state!

5. Bootloader runs again

6. Sees WDRF flag in MCUSR → jump to 0x0000 immediately

7. App runs with perfectly clean hardware state ✅

 

📖 Datasheet §10.8 — Watchdog System Reset Mode

 

7.9 — The Code

#include <avr/wdt.h>

 

/* Enable 1 second watchdog */

void watchdog_enable_1s(void)

{

    wdt_enable(WDTO_1S);

}

 

/* Safe jump to application */

void jump_to_app(void)

{

    wdt_enable(WDTO_15MS);  // shortest timeout

    while(1);               // sit and wait for reset

    /* After ~16ms → RESET → clean hardware state */

    /* Bootloader restarts → sees WDRF → jumps to 0x0000 */

}

 

/* Disable watchdog once upload starts */

void watchdog_disable(void)

{

    wdt_reset();

    wdt_disable();

}

 

7.10 — Chapter 7 Summary

 

Concept

Key Point

Watchdog

Independent 128kHz hardware timer

Purpose

Resets chip if code hangs or crashes

Our use

1 second upload window timeout

MCUSR

Tells us WHY the chip reset

WDRF flag

Set when watchdog caused the reset

jump_to_app()

Set WDT to 16ms, loop, let it reset cleanly

Disable WDT

Call once Avrdude connects — upload takes time

Clean reset

Watchdog reset gives app a clean hardware state

Chapter 8: The Complete Bootloader

 

8.1 — The Complete main.c

Create this file at C:\AVR_Bootloader\src\main.c

 

/*

 * ATmega328P Bootloader

 * Compatible with Arduino IDE (STK500v1 / Avrdude)

 *

 * Application : 0x0000 - 0x7BFF (31,744 bytes)

 * Bootloader  : 0x7C00 - 0x7FFF (1,024  bytes)

 * HFUSE = 0xDA  LOCK = 0xEF  UART = 115200 8N1

 */

 

#include <avr/io.h>

#include <avr/boot.h>

#include <avr/pgmspace.h>

#include <avr/interrupt.h>

#include <avr/wdt.h>

#include <stdint.h>

 

#define BOOT_START          0x7C00

#define PAGE_SIZE           128

#define F_CPU               16000000UL

#define BAUD                115200UL

#define UBRR_VAL            ((F_CPU / (16UL * BAUD)) - 1)

 

#define STK_OK              0x10

#define STK_INSYNC          0x14

#define STK_NOSYNC          0x15

#define STK_CRC_EOP         0x20

#define STK_GET_SYNC        0x30

#define STK_GET_PARAMETER   0x41

#define STK_SET_DEVICE      0x42

#define STK_SET_DEVICE_EXT  0x45

#define STK_ENTER_PROGMODE  0x50

#define STK_LEAVE_PROGMODE  0x51

#define STK_LOAD_ADDRESS    0x55

#define STK_PROG_PAGE       0x64

#define STK_READ_PAGE       0x74

#define STK_READ_SIGN       0x75

#define SIGNATURE_0         0x1E

#define SIGNATURE_1         0x95

#define SIGNATURE_2         0x0F

 

static uint8_t  page_buffer[PAGE_SIZE];

static uint16_t address = 0;

 

/* ── UART ── */

void uart_init(void) {

    UBRR0H = (uint8_t)(UBRR_VAL >> 8);

    UBRR0L = (uint8_t)(UBRR_VAL);

    UCSR0B = (1 << RXEN0) | (1 << TXEN0);

    UCSR0C = (1 << UCSZ01) | (1 << UCSZ00);

}

void uart_send(uint8_t b) {

    while (!(UCSR0A & (1 << UDRE0)));

    UDR0 = b;

}

uint8_t uart_receive(void) {

    while (!(UCSR0A & (1 << RXC0)));

    return UDR0;

}

 

/* ── STK500v1 helper ── */

void get_sync(void) {

    uint8_t eop = uart_receive();

    if (eop != STK_CRC_EOP) uart_send(STK_NOSYNC);

}

 

/* ── Flash self-programming ── */

void write_flash_page(uint16_t word_addr, uint8_t *data, uint16_t len) {

    if (word_addr >= (BOOT_START / 2)) return;

    uint32_t byte_addr = (uint32_t)word_addr * 2;

    boot_page_erase(byte_addr);  boot_spm_busy_wait();

    for (uint16_t i = 0; i < len; i += 2) {

        uint16_t word = data[i] | ((uint16_t)data[i+1] << 8);

        boot_page_fill(byte_addr + i, word);

    }

    boot_page_write(byte_addr);  boot_spm_busy_wait();

    boot_rww_enable();

}

 

/* ── Jump to application ── */

void jump_to_app(void) {

    wdt_enable(WDTO_15MS);

    while(1);

}

 

/* ── Main ── */

int main(void) {

    uint8_t mcusr = MCUSR;

    MCUSR = 0;

    if (mcusr & (1 << WDRF)) {

        wdt_disable();

        ((void (*)(void))0)();   // jump to 0x0000

    }

    wdt_enable(WDTO_1S);

    uart_init();

    while (1) {

        uint8_t cmd = uart_receive();

        switch (cmd) {

            case STK_GET_SYNC:

                wdt_disable();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK);

                break;

            case STK_GET_PARAMETER: {

                uint8_t p = uart_receive(); get_sync();

                uart_send(STK_INSYNC);

                if      (p == 0x80) uart_send(0x02);

                else if (p == 0x81) uart_send(0x01);

                else                uart_send(0x00);

                uart_send(STK_OK); break; }

            case STK_SET_DEVICE:

                for (uint8_t i=0;i<20;i++) uart_receive();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_SET_DEVICE_EXT:

                for (uint8_t i=0;i<5;i++) uart_receive();

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_ENTER_PROGMODE:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

            case STK_LOAD_ADDRESS: {

                uint16_t lo = uart_receive();

                uint16_t hi = uart_receive();

                address = (hi << 8) | lo;

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break; }

            case STK_PROG_PAGE: {

                uint16_t len  = ((uint16_t)uart_receive()<<8)|uart_receive();

                uint8_t  type = uart_receive();

                for (uint16_t i=0;i<len;i++) page_buffer[i]=uart_receive();

                get_sync();

                if (type=='F') write_flash_page(address,page_buffer,len);

                uart_send(STK_INSYNC); uart_send(STK_OK); break; }

            case STK_READ_PAGE: {

                uint16_t len  = ((uint16_t)uart_receive()<<8)|uart_receive();

                uint8_t  type = uart_receive(); get_sync();

                uart_send(STK_INSYNC);

                if (type=='F')

                    for (uint16_t i=0;i<len;i++)

                        uart_send(pgm_read_byte((uint32_t)(address*2)+i));

                uart_send(STK_OK); break; }

            case STK_READ_SIGN:

                get_sync();

                uart_send(STK_INSYNC);

                uart_send(SIGNATURE_0);

                uart_send(SIGNATURE_1);

                uart_send(SIGNATURE_2);

                uart_send(STK_OK); break;

            case STK_LEAVE_PROGMODE:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK);

                jump_to_app(); break;

            default:

                get_sync();

                uart_send(STK_INSYNC); uart_send(STK_OK); break;

        }

    }

    return 0;

}

 

8.2 — Build It

Open a command prompt in C:\AVR_Bootloader\ and run build.bat. The critical check: Program must be under 1024 bytes. That is our bootloader section size.

 

8.3 — Verify the .hex File

Open build\bootloader.hex in Notepad. The first line should show address 7C00:

 

:10 7C00 00 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx

      

       └── 7C00 = our bootloader start address ✅

 

If you see 0000 here — the linker flag in build.bat is not working.

 

8.4 — Flash It

Connect your USBasp and run flash.bat. It will set fuses, flash the bootloader, then set lock bits in that order.

 

8.5 — Test It

Test 1 — Timeout Works

Power on the board with no USB-Serial connected. Wait 2-3 seconds. Your existing sketch should run. This confirms the 1-second watchdog timeout and jump-to-app are working correctly.

 

Test 2 — Arduino IDE Upload Works

Open Arduino IDE, select Arduino Uno board, select the correct COM port, open the Blink sketch, and hit Upload. You should see bytes written and verified, then the LED starts blinking immediately.

 

8.6 — Complete Project Structure

C:\AVR_Bootloader\

├── src\

   └── main.c          ← the bootloader source

├── build\

   ├── bootloader.elf  ← compiled binary (intermediate)

   └── bootloader.hex  ← final hex file (flashed to chip)

├── build.bat           ← compiles main.c → bootloader.hex

└── flash.bat           ← sets fuses, flashes hex, sets lock bits

 

8.7 — Complete Tutorial Summary

 

Chapter

What We Built

1 — How the chip boots

Memory map, BOOTRST fuse, two upload scenarios, startup flow

2 — Fuses & memory

BOOTSZ=512 words, start=0x7C00, HFUSE=0xDA, lock bits=0xEF

3 — Toolchain

Build and flash batch scripts for Windows using Arduino IDE tools

4 — UART

uart_init, uart_send, uart_receive — 3 registers, baud rate math

5 — STK500v1 protocol

10 commands, handshake, address loading, data transfer

6 — Flash self-programming

Erase, fill, write, 128 byte pages, boot.h macros

7 — Watchdog & safe jump

1 second window, WDRF detection, clean app handoff

8 — Complete bootloader

Everything assembled, built, flashed and tested

✅ Tutorial Complete!

You now have a fully working Optiboot-style bootloader written from scratch, with every line of code tied back to the ATmega328P datasheet.