NDS/Tutorials Day 2

From Dev-Scene



[edit] Hardware Overview

The Nintendo DS is rich in features. It possesses one of the most advanced 2D rendering systems ever seen on a console system, abundant memory resources (many, many times that of the SNES), dual processors capable of outperforming the Nintendo 64 (floating point operations aside), integrated wireless networking, a modest 3D system with easy to understand interface, a microphone, and touch pad input.

What follows is a brief description of these features and a foreshadowing of the things you might accomplish with the knowledge gained in this guide.

[edit] Memory Layout

The memory footprint of the Nintendo DS is one of its more intimidating features for newly introduced console programmers. Understanding where memory is and what its uses are is key to getting the most from your applications and in many cases it is key to doing anything at all. Often a picture can be helpful in understanding how memory is arranged.


(Unless otherwise stated the data width for each bus is 16 bits.)

The table below depicts both how these pieces of memory are physically addressed and their actual sizes.

Memory Map
Name Start Address Stop Address Size Wait State
Main 0x02000000 0x023FFFFF 4MB ?
ITCM 0x00000000 0x00007FFF 32KB ?
DTCM 0x0B000000 0x0B003FFF 16KB ?
Shared WRAM Bank 0 0x03000000 0x03003FFF 16KB ?
Shared WRAM Bank 1 0x03004000 0x03007FFF 16KB ?
Main 0x02000000 0x023FFFFF 4MB ?
BIOS 0x00000000 0x00003FFF 16KB ?
IWRAM 0x03800000 0x0380FFFF 64KB ?
Shared WRAM Bank 0 0x03000000 0x03003FFF 16KB ?
Shared WRAM Bank 1 0x03004000 0x03007FFF 16KB ?
Video RAM
Main OAM 0x07000000 0x070003FF 1KB ?
Sub OAM 0x07000400 0x070007FF 1KB ?
Main Palette 0x05000000 0x050003FF 1KB ?
Sub Palette 0x05000400 0x050007FF 1KB ?
Bank A 0x06800000 0x0681FFFF 128KB ?
Bank B 0x06820000 0x0683FFFF 128KB ?
Bank C 0x06840000 0x0685FFFF 128KB ?
Bank D 0x06860000 0x0687FFFF 128KB ?
Bank E 0x06880000 0x0688FFFF 64KB ?
Bank F 0x06890000 0x06983FFF 16KB ?
Bank G 0x06894000 0x06897FFF 16KB ?
Bank H 0x06898000 0x0689FFFF 32KB ?
Bank I 0x068A0000 0x068A3FFF 16KB ?
Virtual Video RAM
Main Background 0x06000000 0x0607FFFF 512KB ?
Sub Background 0x06200000 0x0621FFFF 128KB
Main Sprite 0x06400000 0x0643FFFF 256KB ?
Sub Sprite 0x06600000 0x0661FFFF 128KB ?

[edit] Main Memory

Start Address : 0x0200:0000 
End Address :   0x023F:FFFF 
Mirror :        0x0240:0000

Sometimes referred to simply as main memory, it is the 4 megabyte block of RAM which generally holds your ARM9 executable as well as the vast majority of all game data.

Both the ARM7 and the ARM9 can access this memory at any time. Any bus conflicts are delegated to the processor which has priority (the ARM7 by default but changeable via a control register) causing the other processor to wait until the first has finished its operation.

Although it is possible to execute both ARM7 and ARM9 code from main RAM at the same time, devkitPro defaults to placing the ARM7 into the 64K of fast iwram for performance reasons. Official games generally place both ARM7 and ARM9 executables into Main Memory after which the ARM7 copies the majority of its own code to iwram..

[edit] ARM 7 Fast Ram (IWRAM)

Start Address : 0x03800000
End Address :  0x0380FFFF  

The ARM7 has exclusive access to 64KB of fast 32 bit wide memory. It is this region that contains the ARM7 executable and data. When designing ARM7 code it will be in your interest to keep the binary small.

[edit] ARM 9 Caches

The ARM9 contains both a data cache and an instruction cache. Although the operation of these caches is a bit complex and really out of scope for this document a few things are worth noting.

Main memory is cacheable by default. This means all data and code being accessed from main memory will be stored temporarily in the cache. Because the DMA circuitry and the ARM7 do not have access to the cache often you will get unexpected results if you attempt to DMA from main memory or share data between ARM7 and ARM9 via main memory.

To help utilize the cache effectively the mirror of main memory that begins above 0x02400000 is not cacheable. There are also several functions provided by the library which allow you to flush the data cache and ensure main memory is in sync.

Although the cache adds a certain level of complexity its boost to performance is well worth this small inconvenience.

[edit] Fast Shared Ram

There are two small 16KB banks of fast 32 bit ram that can be assigned to the ARM7 or ARM9. Access to either block by both CPUs at the same time is prohibited. Commonly, both banks will be mapped to the ARM7 as they form a continuous block with ARM7 IWRAM effectively giving the ARM7 96KB of ram.

[edit] Video Ram

The Nintendo DS has nine banks of video memory which may be put to a variety of uses. They can hold the graphics for your sprites, the textures for your 3D space ships, the tiles for your 2D platformer, or a direct map of pixels to render to the screen. Figuring out how to effectively utilize this flexible but limited amount of memory will be one the most challenging endeavors you will face in your first few days of homebrew.

Below is a table of the banks along with a description as to what uses they can be put. You should not worry about understanding this at the moment but it might be handy to bookmark or print out for later use.

View large intimidating table

[edit] Virtual Video Ram

In order for the 2D systems to function they need RAM. One of the major differences between the 2D graphics engine on the Gameboy Advanced and those on the DS is the DS has almost no memory dedicated to the 2D system. Instead of setting aside a given amount of video memory for the 2D system it allows you to map the video RAM banks into 2D engine memory space.

This might be a bit difficult to grasp at first. An example might be helpful.

Scenario: You want to render a tile based map to the screen using the main 2D graphics engine.

Because you are an uber Nintendo DS programmer you already know two things:

  1. Where the 2D graphics engine expects the map and tile data to be
  2. What video RAM banks can be mapped to this “virtual” 2D graphics memory to hold your tiles and map.

Solution: Tell the Nintendo DS to map a video RAM bank to the right place...in this case we might map video RAM bank A (VRAM_A) to 0x6000000 for use as 2D background memory but we could have chosen another bank (turns out almost all vram banks can be mapped to main background memory).


We will revisit this topic when we create our first few 2D demos.

This might seem intimidating and difficult at first, but it does offer you a fair bit of flexibility and power over where everything is.

[edit] Sound Hardware

What would a game be without its compliment of blips, bleeps, and chip tunes?

Sound and music are an important piece of any good game and to ensure your next graphical adventure is accompanied by an equally astounding audio experience the DS comes equipped with some impressive hardware.

16 independent audio channels can pump digitized music in 8 bit, 16 bit, or ADPCM format. Each channel has its own frequency, volume, panning, and looping controls allowing for virtually CPU free MOD quality playback.

[edit] Wifi

What is there to say but that it supports communication with 802.11 standard access points. A full socket library has been implemented which allows porting of PC network code to the DS.

[edit] Input

User input is where the DS excels and is the basis for its much lauded inventive game play. 8 Buttons, 4 direction D-Pad, Touch screen, and microphone make for an interesting mix of possibilities.

[edit] Touch Screen

As I am sure you have already noticed the DS has a touchpad. It is very standard in operation and communicates to the DS via a serial interface to the ARM7. Using the default ARM7 binary from libnds causes the touch screen values to be read once per frame and reported to an area you can reach with the ARM9.

In the next chapter we will cover how to access the touchpad values in code.

[edit] Buttons

Button presses are detected by reading registers on the ARM7 and ARM9. Some buttons are only detectable by the ARM7: the door open-close, the X and Y buttons, and the pen down “button” are all detected on the ARM7 and recorded in shared Main Memory for access by the ARM9.

Our first example demo in the next chapter will include ways to read the buttons.

[edit] Microphone

Perhaps one of the most interesting additions to the Nintendo DS was the inclusion of a microphone. I have not played with it much to be honest but many interesting ideas come to mind and we will defiantly do a demo or two which captures input from the microphone.

[edit] Real-Time Clock

Being able to know what time it is to the second is pretty handy. Your game can respond differently based on the time of day, you can tell how long it has been since the player last played the game, or it can be used as simple in-game clock. And best of all, reading the date and time is a snap.

[edit] Upgradeable Firmware

The firmware on the Nintendo DS is stored in flash memory. It can be overwritten with custom firmware. For more information on how to achieve this check here

Upgrading the firmware is useful to developers because it allows you bypass the RSA check when downloading wifi demos. This means we can send our own .nds files to our DS via Wifi instead of just officially signed ones. Also, the hacked firmware will check the GBA slot prior to booting. If it finds an .nds file signature it will execute the code automatically eliminating the need to use a passthrough based device each time you wish to run code from your GBA cart-based flash cart.

If you currently use a passthrough device to boot your .nds files from the GBA slot, upgrading the firmware is an easy and relatively safe process.

[edit] Graphics Overview

Believe it or not, the Nintendo DS is a very capable very advanced graphics power house. It has an interesting combination of 2D and 3D rendering hardware.

[edit] 2D

The Super Nintendo is considered by many to be the best 2D console ever made (by many I really mean me…nobody else counts). SNES possessed a 16-bit 3.58Mhz processor, 128KB of 8 bit ram and 64KB of video ram. By comparison the Nintendo DS has a 32-bit 66Mhz processor, 4MB of main ram, and 656KB of video ram and that’s not counting all its little caches of fast 32 bit ram nor its second 33Mhz processor. This is a very capable machine.

There are two separate graphics cores on the DS. They are referred to as Main and Sub graphics cores. Each core has similar features which vary depending on their mode of operation. The major differences between the cores are as follows:

  • The main core has two extra modes which are capable of rendering large bitmaps.
  • The main core can give up one of its background layers to the 3D engine.
  • The main core can bypass the 2D engine and render from memory to the screen directly in what is often referred to as frame buffer mode.

As alluded to above, each core operates in one of several modes. Below is a table of these modes.

Graphics Modes
Main 2D Engine
Mode BG0 BG1 BG2 BG3
Mode 0 Text/3D Text Text Text
Mode 1 Text/3D Text Text Rotation
Mode 2 Text/3D Text Rotation Rotation
Mode 3 Text/3D Text Text Extended
Mode 4 Text/3D Text Rotation Extended
Mode 5 Text/3D Text Extended Extended
Mode 6 3D - Large Bitmap -
Frame Buffer Direct VRAM display as a bitmap
Sub 2D Engine
Mode BG0 BG1 BG2 BG3
Mode 0 Text Text Text Text
Mode 1 Text Text Text Rotation
Mode 2 Text Text Rotation Rotation
Mode 3 Text Text Text Extended
Mode 4 Text Text Rotation Extended
Mode 5 Text Text Extended Extended

Text backgrounds are general purpose tiled backgrounds; rotation backgrounds are also tiled and can be rotated and scaled. Extended rotation backgrounds support a larger set of tiles (at the expense of a larger map), support more palettes, and can operate in a bitmap mode as well as tiled modes.

These modes and background types will be explored as we go along.

[edit] 3D

It may not posses the poly pushing, texture blending, hardware pixel shading capabilities of the current generation GPUs but where the Nintendo DS lacks in performance and eye candy it excels in features.

Limited to 6144 vertexes per frame (about 2048 triangles or 1536 quads) the 3D system might seem a bit sparse. But given the small screen size a lot can be done with even this small number of available points.

Hardware fog, lighting, and transformation along with non blending texture mapping, toon-shading, and edge anti-aliasing make up a rather impressive set of features for an otherwise lackluster 3D machine.

The 3D core operates as a very openGL like state machine allowing much of its functionality to be wrapped in gl compliant code. One major difference between open gl and the DS core is the absence of floating point number support. All operations on the DS are carried out in fixed point precision.

If you want to get a jump on 3D look at the 3D examples included with libnds and the NeHe tutorials the source code originated from.

[edit] Toolchain Explained

Understanding how your code goes from being a text file to being executed on the Nintendo DS will become very important as your projects progress in complexity. To aid in that understanding we are going to recreate Demo 1 from scratch and build it step by step from the command line.

Before we begin there are a few terms you are likely familiar with but I feel necessary to go on about anyway.

[edit] Compiler

The compiler is the first tool you pass your C source through. It is responsible for interpreting that code and translating it into machine based assembly language. From there the assembly language is further reduced into its binary machine code equivalent by another tool known as the assembler (which the compiler will call for you).

The output of the compiler is generally not executable but is instead in what is known as an object file format. Although the instructions have been translated to machine code binary, the decisions about where that code and associated data is to be physically placed in memory have been left undecided.

[edit] Linker

The tool used to combine the object files and determine physical addressing such that functions from a multitude of object files can operate in a coherent fashion is called the linker. By passing your object files to the linker you can produce an executable binary file.

Because the linker is responsible for determining where things should be placed physically within the NDS system it must be told a fair amount of information about the memory layout of the DS. This description is located in a linkscript file which describes both the memory layout of the DS and the way in which we want the different regions of our code to map to it. Here is a small piece of the devkitARM default linkscript for the arm9 (yes there is a separate one for the arm7 since it has a different memory layout).

 OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
 	rom	: ORIGIN = 0x08000000, LENGTH = 32M
 	ewram	: ORIGIN = 0x02000000, LENGTH = 4M - 4k
 	dtcm	: ORIGIN = 0x0b000000, LENGTH = 16K
	itcm	: ORIGIN = 0x01000000, LENGTH = 32K
__itcm_start	=	ORIGIN(itcm);
__ewram_end	=	ORIGIN(ewram) + LENGTH(ewram);
__eheap_end	=	ORIGIN(ewram) + LENGTH(ewram);
__dtcm_start	=	ORIGIN(dtcm);
__dtcm_top	=	ORIGIN(dtcm) + LENGTH(dtcm);
__irq_flags	=	__dtcm_top - 0x08;
__irq_vector	=	__dtcm_top - 0x04;
__sp_svc	=	__dtcm_top - 0x100;
__sp_irq	=	__sp_svc - 0x100;
__sp_usr	=	__sp_irq - 0x100;

There is much more to this script; most of which is utterly incomprehensible and any of which can have extremely difficult to understand consequences if muddled with. It is good to understand these scripts do exist and their general purpose, but actually editing them is well beyond the scope of this document.

The snipit above was chosen because it is somewhat comprehensible; it describes the 4 primary memory regions of the DS that will be used to contain code and data.

  • rom is the GBA cartridge space; it is 32MB in size and begins at an absolute address of 0x8000000
  • ewram is external working ram and is the slow 4MB of main memory for the DS.
  • dtcm stands for data tightly coupled memory and is a special area of memory on the ARM9 intended for use as fast data memory. The standard link script places the stack in this area. dtcm is a mere 16k so be careful with those local variables.
  • itcm stands for instruction tightly coupled memory and is another special area intended for use as fast instruction memory. This area is 32k and may be used for small functions which need to be fast. libnds uses this area for the interrupt dispatcher.

The rest of the script file deals with mapping of your code to these regions (read only data, code, variables, stack, global data, constructors for C++ stuff, etc). It does this using an obscure and rather intimidating expression type language that I will not even pretend to understand completely. Fortunately a few people like Jason Wilkins, Jeff Frohwein, and most recently (and most successfully) Dave Murphy have done the grunt work; what was once something which caused countless "interesting" issues can now be relied upon confidently to just work.

Generally speaking there is no need to worry about the linkerscript unless you have some pressing need to change where things go in memory. Everything except the stack goes in main memory by default so all you have to worry about is fitting everything into 4meg.

[edit] Build A Demo The Hard Way

To sum things up you first compile your source files into object files and then link them into a binary executable. Normally this would be the end of the process but, alas, our little DS is a bit more complex as it has not one but two processors and each need their own separate binary executable.

What are we to do? Create both of course. The process is identical with the only difference being we use the linkscript for ARM7 when linking the arm7 object files.

The final step in the process is packaging the binaries into a single file that we can then load onto the DS. Fortunately Darkfader has created a tool for us (included in the devkitPro package) which does just this. Official NDS game carts happen to use a file format which suites our needs rather well. This format includes a small header with an embedded logo and a short description of the .nds file in several languages, followed eventually by the executable binaries for the arm7 and arm9. The included logo and description text shows up when you boot a game card from the firmware or start to download one over wireless multiboot. (After a bit of investigation it turns out the logo is not actually embedded in the header but exists separately...it is however pointed to by the header)

Now that we have a feel for the process let us create a full .nds file from the command line (so we can confidently never do it again).

Okay…one quick thing to mention. When we created demo1 you may have noticed we had no arm7 code in the project. The reason we are able to get away with this is there is an arm7 executable binary already present in the devkitPro package that you can include by default. This binary performs some very basic things such as read the touch pad and real time clock as well as some very simple sound playback. For anything more advanced you will be providing your own arm7 code or at least modifying the supplied code.

Now…on to the demo. Follow the instructions from Day 1 on building your first demo with the following exception: Instead of grabbing the arm9 template grab the template labeled combined. Within you will find an arm9 folder and an arm7 folder. Replace the code in the template.c inside the arm9 folder with the demo1 code (or leave it the same as it does not really matter what we compile for this exercise).

You will notice a makefile inside the combined directory. If you were to navigate to that directory in a Dos/terminal window (try this now in fact), you could simply type make and the scripts contained inside the makefile would do all the steps we are about to do by hand.

Here are the commands needed to build the nds file from the command line.


Before we explain what is going on it is best to take a moment and absorb just how much compiling from the command line sucks…

Good, now that we know let us figure out what we did.

What is not shown is me setting my path so that the devkitARM tools could be found. On windows this is simply:


First we invoked the compiler on the arm9 template.c file. This translated it into an object file (template.o). We passed it the file as an argument as well as the include directory for the libnds header files we are using. Because libnds does different things depending on if you are constructing an arm7 or an arm9 binary we must define ARM9 with the –D option.

Next we link the file into an executable (.elf file). We pass the object file as an argument, we tell it we would like to include libnds (-lnds) and then we tell it where to look for the linkscript and what default libraries to use (-specs=ds_arm9.specs).

Because our loader does not handle the .elf format very easily we strip away all that extra info using objcopy. This leaves with a nice flat binary for execution. (the .elf file contains debug information and other things which are useful; you will need the .elf file to use the remote debugger or the source level debugger in no$gba if you can afford such luxuries).

This entire process is repeated for the arm7 leaving us with an arm7.bin and an arm9.bin. We next combine these binaries into an .nds file using ndstool.

If there is anything you should take from all this it is the convenience of the template makefiles. All you do is drop .c/.cpp/.s files into the source directories for the processors, .h files into the include directories and .bin data files into the data directory (more on this when we talk about getting your data into your program) and type make. The script will automate this entire process in an efficient and easy to use fashion which reduces the entire painful process of above into the single command: make.

[edit] Conclusion

Today we took a short peek at the capabilities of the DS and learned a bit more detail on the process of creating executable from code. Much of the hardware descriptions were intentionally vague as the real detail will come in the following chapters.

Tomorrow we will begin looking at the hardware in detail when we explore raster graphics.

Dev-Scene (c)