Great Commodities

Great Commodities

I’ve been reading Paul Graham’s essays for many years and almost always find something insightful. His latest post, Let the Other 95% of Great Programmers In, is no exception.

However more great programmers will not help Silicon Valley.

Most US companies are based on a strongly-typed hierarchy whose evolutionary path is entropic and bureaucratic. This means shallow leadership, ineffective hiring practices and the inability to identify and reward greatness. A programmer cannot be a commodity if his or her value is dependent on this cluster-fuckery and as a non-commodity he is indistinguishable.

I wish that Graham didn’t think of programmers as commodities to begin with. Maybe he doesn’t, but I don’t know how he could have written the essay otherwise.

 

Dear Spaghetti Coder

Dear Spaghetti Coder

Dear Spaghetti Coder,

I grasp that you can’t be bothered with declaring your allegiance, once and for all, to a particular brace style. I know you like to “mix it up”.

I understand your need to never delete anything and instead leave in long blocks of old, unusable, commented-out code. You never know when you might need it.

I realize that there is never time to make real comments in your code, particularly anywhere near your numerous long, difficult switch cases. It’s not your fault you had to hard-code all those strings.

I know you must be clever since you use so many inexplicable, often funny, variable names. You’re such a show off.

I see that you keep re-writing the same unoptimized, two-to-four-banger nested loop functions over and over again instead of wrapping them all into a single, elegant function. You like to flex your muscles.

I’m sorry that you’ve been hurt bad by the Tab key. You’re appropriately working out this issue in your code instead of paying for expensive therapy.

And I grok your single-minded desire to arrange your code in such an arbitrary fashion that we get to Treasure-Hunt our way through your gamified tome. It must give you endless hours of DungeonMaster-like pleasure.

Sincerely,
He Who Must Fix All Your Crap And Make It Actually Work

P.S. I think you should consider a move into management. No, really.

Painting to Texture on iOS

Draw Something has continued to do very well in the App Store and now we’re seeing more derivatives — apps and games with basic painting and sketching capabilities. Last weekend I had some fun playing around with a basic painting setup, just to see how much brush (pun intended) I’d have to go through to get to the painting picnic.

On iOS, there are really only a couple of ways to implement a painting app — Quartz or OpenGL ES (there’s a nice little walk-through using Quartz here, and Apple put together a cool little OGL example called GLPaint here).

It should be relatively clear that the OGL approach is cleaner, more flexible and a bit faster. But while GLPaint is a nice place to start, it’s not very app- or engine-friendly in the context of a full-fledged OGL app. Since the point of GLPaint’s example code is demonstrate how to do basic painting, it has no need to consider the rest of your OGL surface’s render loop, nor does it concern itself with other important engine pieces, like your OGL redundancy state checker, sorting and synchronization issues between render and update calls, and most important, clearing the buffer.

That last bit really is most important because a nicely-performing painting app should never clear the buffer. Doing so will quickly slow things down to a slide-show. This should be obvious: In order to paint to the screen — whether you’re using GL_POINT_SPRITE_OES or rolling your own quads — you’ll need to draw a ton of sprites on-screen to get a continuous line of color and/or texture. If you clear every frame, you have to re-draw every frame, and voila, you’ll have molasses in less time than it takes to launch the simulator. If you don’t clear, you’re only drawing a handful of new sprites each frame.

The GLPaint example does this — it doesn’t clear the buffer. However in a real-world app, you must clear every frame in order for anything else — GUI elements/textures, mesh rendering, camera changes, etc. — to work. Hence the conundrum — you need a nice, normal clear/render loop but you also need a render-only call each time you want to paint.

Luckily there’s a straightforward solution: painting to a texture, then render that texture in your normal render loop. And setting up a texture to paint to is easy, by attaching it to an FBO. For example, a full-screen texture buffer:

- (GLenum) CreateRenderTexture
{
    m_texW = [[UIScreen mainScreen] bounds].size.width;
    m_texH = [[UIScreen mainScreen] bounds].size.height;

    glGenFramebuffers(1, &m_texFrameBuffer);
    glBindFramebuffer(GL_FRAMEBUFFER, m_texFrameBuffer);

    glGenTextures(1, &m_texTexture);
    glBindTexture(GL_TEXTURE_2D, m_texTexture);
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, m_texW, m_texH,
        0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
        GL_TEXTURE_2D, m_texTexture, 0);

    return glCheckFramebufferStatus(GL_FRAMEBUFFER);
}

From there it’s simply a matter of drawing to the texture and only clearing the texture when you need to, e.g., by calling a function like this:

- (void) StartTextureRender:(BOOL)clear color:(COLOR)color
{
    glBindFramebuffer(GL_FRAMEBUFFER, m_texFrameBuffer);
    if (clear)
    {
        glClearColor(color.r, color.g, color.b, color.a);
        glClear(GL_COLOR_BUFFER_BIT);
    }
    glViewport(0, 0, m_texW, m_texH);
    // setup ortho matrix, render and client states here...
}

One of the cool things about Draw Something is that it records and replays your drawing. This is relatively straightforward functionality to implement, and GLPaint kinda-sorta does it as a nice bonus. However their implementation is on the oddball side and a bit shy of more readable, that-makes-sense-to-me production code. A clearer way to implement it is to do a standard 2D lerp between the current touch (as you move your finger on the screen) and the last touch. Then record the time it took between finger-down and finger-up for playback later. For instance:

- (void) Draw:(float)x y:(float)y
{
    if (numVerts == MAX_PATH_VERTS) return;
    end         = Vec2(x, y);
    dist        = Vec2Dist(start, end);
    num         = (int)dist;
    if (num > 0)
    {
        startVert   = numVerts;
        numVerts    = MaxInt(numVerts + num, MAX_PATH_VERTS);
        for (int i = startVert; i < numVerts; i++)
        {
            Vec2Lerp(&verts[i], start, end, (i - startVert) / dist);
        }
        time += [Engine GetTimeElapsed];
    }
    start = end;
    [self DrawRender];
}

Below is the entire class (note that several of the vars are structs elsewhere in the engine, but you get the gist).

// INTERFACE
#define MAX_PATH_VERTS  20000
@interface Path : NSObject
{
@public
    VEC2            verts[MAX_PATH_VERTS];
    int             numVerts;
    int             startVert;
    VEC2            start;
    VEC2            end;
    VEC2            cur;
    Texture*        texture;
    COLOR           color;
    float           size;
    float           tick;
    float           time;
    int             num;
    float           dist;
    BOOL            replaying;
    int             vertCount;
    int             curVert;
    int             endVert;
}
@property (nonatomic, readwrite) BOOL replaying;
- (id)   initWithColorTextureSize:(COLOR)c texture:(Texture*)t size:(float)s;
- (void) DrawStart:(float)x y:(float)y;
- (void) Draw:(float)x y:(float)y;
- (BOOL) Replay;
- (void) ReplayStart;
@end

// IMPLEMENTATION
@implementation Path
@synthesize replaying;
- (id) initWithColorTextureSize:(COLOR)c texture:(Texture*)t size:(float)s
{
    if (!(self == [super init])) return nil;
    numVerts        = 0;
    texture         = t;
    color           = c;
    size            = s;
    return self;
}
- (void) DrawRender
{
    if (num > 0)
    {
        glEnablePointSprite(GL_TRUE, size);
        glSetTexture(texture.index);
        glSetColor(color.r, color.g, color.b, color.a);
        glSetVertexPointerEx(&verts[0], sizeof(VEC2), 2);
        glDrawArrays(GL_POINTS, startVert, num);
    }
}
- (void) DrawStart:(float)x y:(float)y
{
    if (numVerts == MAX_PATH_VERTS) return;
    verts[0]    = start = end = Vec2(x, y);
    numVerts    = num = 1;
    startVert   = 0;
    time        = 0;
    [self DrawRender];
}
- (void) Draw:(float)x y:(float)y
{
    if (numVerts == MAX_PATH_VERTS) return;
    end         = Vec2(x, y);
    dist        = Vec2Dist(start, end);
    num         = (int)dist;
    if (num > 0)
    {
        startVert   = numVerts;
        numVerts    = IEMaxInt(numVerts + num, MAX_PATH_VERTS);
        for (int i = startVert; i < numVerts; i++)
        {
            Vec2Lerp(&verts[i], start, end, (i - startVert) / dist);
        }
        time += [Engine GetTimeElapsed];
    }
    start = end;
    [self DrawRender];
}
- (BOOL) Replay
{
    if (replaying)
    {
        tick    = Max(tick + [Engine GetTimeElapsed], time);
        curVert = endVert;
        endVert = (int)Max(Lerp(0, numVerts, tick / time), numVerts);
        dist    = Vec2Dist(start, end);
        end     = verts[endVert];
        for (int i = startVert; i < endVert; i++)
        {
            Vec2Lerp(&verts[i], start, end, (i - startVert) / dist);
        }
        start = end;
        int count = MinInt(endVert - curVert, 1);
        if (count > 0)
        {
            glEnablePointSprite(GL_TRUE, size);
            glSetTexture(texture.index);
            glSetColor(color.r, color.g, color.b, color.a);
            glSetVertexPointerEx(&verts[0], sizeof(VEC2), 2);
            glDrawArrays(GL_POINTS, curVert, count);
        }
        replaying = (endVert != numVerts);
    }
    return replaying;
}
- (void) ReplayStart
{
    curVert   = 0;
    endVert   = 0;
    replaying = (curVert < numVerts);
    if (replaying)
    {
        tick  = 0;
        time  = Min(time, 0.001);
        start = verts[0];
        end   = verts[1];
    }
}
@end

An NSMutableArray of multiple instances of this class is kept by the caller; each instance is born on finger-down (where we set color, brush texture and size) and dies on finger-up. Replay is easy — essentially just a programmatic rendering of the verts that were previously recorded while iterating over the NSMutableArray, handled with a flag in the Render() function. Below is the basic idea.

- (void) TouchDown:(float)x y:(float)y
{
    if (curSize == -1) // we're erasing here
    {
        [Engine StartTextureRender:YES color:curBackColor];
    }
    else
    {
        [Engine StartTextureRender:NO color:curBackColor];
        curPath = [[Path alloc] initWithColorTextureSize:curColor
                                                 texture:curTexture
                                                    size:curSize];
        [paths addObject:curPath], [curPath release];
        [curPath DrawStart:x y:y];
    }
}
- (void) TouchMove:(float)x y:(float)y
{
    [Engine StartTextureRender:NO color:curBackColor];
    [curPath Draw:x y:y];
}
- (void) TouchUp:(float)x y:(float)y
{
    curPath = nil;
}
- (void) Render
{
    [Engine RenderToTexture];
    if (replaying)
    {
        [Engine StartTextureRender:NO color:curBackColor];
        if (![curPath Replay])
        {
            curPath = nil;
            for (Path* m in paths) { if (m.replaying) { curPath = m; break; } }
            replaying = (curPath != nil);
        }
    }
}

One of the cool things about using OGL for painting and sketching is that you can very easily change up the brush texture, for nice Photoshop-like texture brushes (care should be paid to how you setup the blending, however, due to pre-multiplied alpha on iOS). While it’s possible to do this with Quartz, it’s much easier to grok using OGL. And of course you can do silly/fun stuff like paint a background behind your 3D orc model (maybe there’s a game idea in there somewhere, hmm — ok, maybe not).

FAIL: Virtues of a Programmer

In the second edition of Programming Perl, Larry Wall famously outlined the Three Virtues of a Programmer:

1. Laziness — The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don’t have to answer so many questions about it.

2. Impatience — The anger you feel when the computer is being lazy. This makes you write programs that don’t just react to your needs, but actually anticipate them. Or at least pretend to.

3. Hubris — Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won’t want to say bad things about.

Besides the humor, anyone who has seriously slung code for a living connects with the three virtues. Programmers very often do have a wonderful kind of procrastinately excitable arrogance that enables them to do remarkable things. I like to say, “A programmer is both the laziest and hardest-working person on the planet — nobody works harder to find the easiest way to do something.”

But when it comes to the job of interviewing, the three virtues are dangerous. I hate to think of all the mistakes made by companies who let untrained programmers interview candidates; if you could convert all those missed hires to ROI, it would most certainly scare the crap out of the CTO. But it’s still prevalent, and not just at the usual suspects like Google and Facebook. It’s alive and well at startups, too.

Maybe it’s getting worse. Over the last several months I’ve heard more complaints than usual from engineering friends and associates about their interview experiences. And Glass Door seems to be full of stupid code problems and bad experiences from interviewees. It’s always the same: A small team of one or more programmers show up to conduct an interview. They’re a little late, somewhat nervous and dispense with small talk as if it’s a virus. They’re not into answering questions about their company, can’t talk in detail about what they’re working on, and need to look at their phones whenever possible.

Then it comes: Ye olde academic probleme. It’s a fairly rote, perhaps even difficult, [typically] algorithmic problem that has little or nothing to do with their day-to-day software engineering. In fact it’s a safe bet that the programmers giving the problem couldn’t work it themselves if they’d hadn’t recently played with it. To top it off they want you to code it up on a whiteboard (or worse, over IM/video chat).

It would be laughable it wasn’t so bad for the job seeker — and if it wasn’t so bad, ultimately, for the company looking to hire (especially startups).

First off, it’s patently absurd to ask a programmer to code something on a whiteboard. Design? Sure. Program flow? Fine. Basic approach to solving a problem? Okay. But solving an academic problem, on-the-spot, with pseudocode? Ridiculous. Programming is as much art as science and programmers need time to think, write, run and debug problems, especially difficult ones. They need to be in the zone — comfortable, engaged, with their tools and libs (and functions they stopped re-writing years ago that they now simply copy and paste) by their side. This is the case for every programmer I’ve ever worked with, from CS majors to Phd’s to the self-taught liberal arts majors, dropouts and homegrowners with no formal education.

Second, the code problem(s). Just why are they so academic? Other than maybe someone who is fresh out of college with no substantive personal code base, who among us actually rewrites a sorting algorithm from scratch,  or revisits an LRU cache problem for fun, or proudly displays our awesome linked list skills, or spends our evenings re-hashing time complexity for that old Connect Four or Game of Life problem from school? (Note: I’m not suggesting that time complexity isn’t super-important, but you don’t have to know how to write the Wikipedia article on Big O to understand it). But even if you know the problem ahead of time, can you really code it up nice and sweet, in something resembling workable code, in 10 or 15 minutes — in an interview, on a whiteboard? (Especially if you’re already coding for a living every day anyway?)

I could go on but you get the point. I’ve interviewed lots of programmers over the years and I’ve never put them through the above rigamarole. The best code test is given to the programmer before the interview, basta. The best evaluation is looking at real code that compiles. The best person for the job is almost always the engineer who is a great general problem-solver — he or she is far more valuable than rote language skill or recent exposure to a specific problem or class of algos, for example. Sure there are exceptions and times when you need more of a code-chimp — a trained warrior — than anything else, but 99% of the time you need wizards and mentalists and thieves. Those are the guys and gals who get things done, are constantly improving, and who work with or without a paycheck. You just have to teach them how to be good interviewers.

Terrain on iOS

Over the holiday I decided to play with a new terrain implementation for an upcoming Kineplay title. Terrain rendering is nothing new and there are a lot of approaches to it — check out the dozens of implementations at the Virtual Terrain Project alone. While I’ve implemented terrain several times over the years, for mobile it’s always been an extra challenge — almost always degenerating into a small quadtree with no real LOD (too expensive),  ridiculously few triangles (polygon limits on most mobile platforms before iOS) and horribly small textures (never enough memory).

With iOS, there’s a bit more room and with a careful implementation, interactive frame rates are possible with a very large terrain that actually looks halfway realistic. There are plenty of caveats, but at least it’s more-or-less doable. This post is about the more interesting bits in the implementation I wrote over the break, with some code snippets that I hope other aspiring mobile terrain makers will find useful. Unfortunately I can’t publish a complete project since the code is integrated with our engine. If I can ever find the time, I’ll do a follow-up with a standalone project.

There are lots of interesting challenges with terrain, but the most important is LOD (Level of Detail). LOD is often deployed for other geometry in a game, but it’s crucial for terrain. Terrain LOD is how accurately the terrain is rendered, based on some condition(s). For example, a completely flat terrain only needs one quad — two triangles — to accurately render it. For a complex mountain-side, far more polygons would be needed.

This can ramp up very fast for complex morphology — in fact, maintaining good frame rates even on non-mobile platforms can be a challenge. While an Xbox 360 or PC title has orders of magnitude more polys to work with than iOS, a large, high-resolution terrain can still significantly impact the poly budget. For example, just one 256×256 grid of quads takes  66049 vertices/65536 triangles. Many grids that size would be needed to very nicely represent, say, a 2km square area. Even 64 grids and you’re already over four million tris — over a million after frustum culling.  And if the terrain is the least bit interesting, it’s all fill-rate happy.

This is where terrain LOD comes in, in a big way. There are two basic types of LOD:

1. Static LOD is the terrain’s representation based on its heightfield complexity. The fun part of static LOD is how to seamlessly join neighboring vertices at different LODs.

2. Dynamic LOD is the terrain’s representation based on the distance between the camera and the terrain’s vertices. The joy of dynamic LOD is figuring out the best way to avoid excessive “popping” from one LOD to the next.

Lots of Master’s thesii have been written about these (see the Virtual Terrain Project link above) — more on dynamic LOD since it tends to be harder to get right and has to be done every frame. For this blog post, I’ll focus on static LOD. For terrain on iOS, it’s a must and how you implement it will have an impact on how you implement dynamic LOD. I’ll cover my dynamic LOD scheme in a future post. For my implementation, the important steps for static LOD were:

Heightfield Data. I generated an 16-bit .raw file using Grome (a great program from Quad Software which I recommend, although the interface will take some getting used to). 16-bit is important: For a large terrain, an 8-bit heightfield simply does not provide enough resolution to accurately represent hills and mountains, for example. Note that the dimensions of the heightmap are 2n+1 — not 2n (for example, 1025×1025 instead of 1024×1024). If you think of each pixel as the height of a vertex at a given (x, y) position, with the sub-pixel space between pixels as a quad, this make sense. You can also visualize a pixel as a quad with four sub-pixels in each corner, with the sub-sub-pixel space defining a sub-quad, although IMO that’s overkill. Loading a 16-bit .raw file is easy, by the way (in this example, mapDataLen and mapData are class variables — an int and GLushort*, respectively):
 

- (void) LoadRaw16:(NSString*)filename
{
    mapDataLen = GetFileLen(filename);
    if (mapDataLen > 0)
    {
        FILE* file = GetBundleFileBinary(filename);
        mapData    = malloc(mapDataLen * sizeof(GLushort));
        fread(mapData, 1, mapDataLen, file);
        fclose(file);
    }
}

 

Chunk It Up. A common idea behind most real-time terrain engines is to create chunks of terrain, at some minimum and maximum resolution (based on static and dynamic LOD requirements, for example). Chunks are a great way to setup the terrain, giving us a way to quickly determine what parts of the terrain to render. In my implementation, I pass in the number of chunks I want to create in the init function — 32, for example, which would create 32×32 or 1024 chunks of terrain. Each chunk will end up with some resolution — the number of quads in the chunks — based on the static LOD requirements of the terrain morphology. Call it the maxLevel  — it specifies the max number of nxn quads in each chunk. At a maxLevel of 16, for example, one chunk of terrain will use 256 quads, or 512 triangles, to render it.

There are several ways to compute the static LOD for a chunk. A fairly easy, fast way is to guess at how co-planar (some number of triangles in) the chunk is based on its four corner height values. Sample each corner from the raw height data, add ’em up and take the average. In practice you’ll need a bias to “force” the chunk’s resolution higher or lower, depending on how much overall resolution you want to see and how much rendering time you’re willing to give over to terrain. In my implementation, I also found it convenient to normalize my co-planarity value to (0, 1) by computing the min/max height for the chunk and dividing by the min/max range (multiplied by the bias, also in (0, 1)). Having the value normalized made it easy to pass it a function in my main Terrain class that could quickly return the actual LOD needed.

Patch Class. To save a few tons of memory, I created a terrain patch class that stores indices to vertices for each of the LODs needed to render a chunk of terrain. This kind of thing has been done before, and there are a number of ways in which to do it. Many of them like to focus on creating full patches that add single-row strips of quads to connect up LODs. In my implementation, to optimize fetching patches by easing up on the number of memcpy calls to fill index buffers, I opted for a larger number of chunks (with some repetitive data) that cover the LOD differences for each level (from 1-maxLevel) for each of four potential neighboring patches at all possible levels (1-maxLevel).

That last sentence was a mouthful — hopefully something visual will help. The image below shows the patches needed for each level with a maxLevel of 16. In each row, every quad except for the last two consists of four patches (for each neighbor — left, right, top and bottom) that connect that level to all the other potential LODs up to maxLevel. In the first row, for example, the first quad — let’s call it qLOD — equals 1, and the LOD for its left, right, top and bottom — we’ll call it nLOD — is also 1. For the second quad in row 1, qLOD = 1 and nLOD = 2, the third quad is qLOD = 1 and nLOD = 4, and so on. Each step is double the last one. So for qLOD = 1, the possible steps up to maxLevel are 1, 2, 4, 8 and 16. For qLOD = 2 (the second row), the possible steps are 2, 4, 8 and 16 — hopefully you get the picture:

Patch Levels

 

 

 

 

 

 

 

 

 

 

 

The last two quads in each row are the “internal partial” and full patches needed to complement the edge patches. An internal partial patch consists of the rest of the quads needed after the edge patches are computed (note that qLOD = 1 doesn’t need it). The full patch is for cases where all neighbors match.

The construction of the patches is clockwise (for OpenGL, GL_FRONT face culling), and built from the middle of a quad to the extents of each side (left, right, top, bottom). It ends up as a single subdivision for each triangle according to nLOD. For example, the first three quads above will produce the following triangles (shown green-filled) for nLOD levels 1, 2 and 4, for the left side:

 

 

 

 

This patching scheme is based on the observation a neighboring chunk with a different LOD only needs to be changed if it’s a higher LOD. Accordingly, when the patches are requested at runtime, care must be taken to request them in the right order with logic that checks neighboring chunks for LODs that are higher.

Below is a snippet from the code that computes the patches. patchCount is the maxLevel; count is the current LOD to compute (qLOD above); NF, NP, NL, NR, NT and NB are the each level (nLOD above), corresponding to full, partial, left, right, top and bottom (again, NF covers the case where a full patch is needed — all neighbors match the current chunk, while NP covers the patch need when one or more edge patches are changed). TerrainPatchBuffer stores an index buffer for each patch — which references an array of vertices 2n+1.
 

/////////////////////////////////////////////////////////////////////////////
// ComputePatch
// -- winding order is clockwise (GL_FRONT)
/////////////////////////////////////////////////////////////////////////////
- (void) ComputePatch:(int)count NF:(int)NF NP:(int)NP
                                 NL:(int)NL NR:(int)NR NT:(int)NT NB:(int)NB
{
    IETerrainPatchBuffer* m = nil;
    int n, x, y, ncount, nsize, max = count - 1,
                                 size = patchCount / count, half = size / 2;
    IE_POINT2D p1, p2, p3, p4, start, center, center2;
    if (NF)
    {
        m = [[IETerrainPatchBuffer alloc] initWithCount:count * count * 6
                                             patchCount:patchCount];
        m.type      = PATCH_F;
        m.typeCount = count;
        for (x = 0; x < count; x++)
        {
            p1.x = x * size;
            p2.x = p1.x + size;
            p3.x = p2.x;
            p4.x = p1.x;
            for (y = 0; y < count; y++)
            {
                p1.y = y * size;
                p2.y = p1.y;
                p3.y = p1.y + size;
                p4.y = p3.y;
                [m AddQuad:p1 p2:p2 p3:p3 p4:p4];
            }
        }
    }
    else if (NP && count != patchCount)
    {
        m = [[IETerrainPatchBuffer alloc] initWithCount:max * max * 6
                                             patchCount:patchCount];
        m.type      = PATCH_P;
        m.typeCount = count;
        for (x = 0; x < max; x++)
        {
            p1.x = half + x * size;
            p2.x = p1.x + size;
            p3.x = p2.x;
            p4.x = p1.x;
            for (y = 0; y < max; y++)
            {
                p1.y = half + y * size;
                p2.y = p1.y;
                p3.y = p1.y + size;
                p4.y = p3.y;
                [m AddQuad:p1 p2:p2 p3:p3 p4:p4];
            }
        }
    }
    else if (NL >= count && count != patchCount)
    {
        ncount      = NL / count;
        nsize       = patchCount / NL;
        int tcount  = (count * ncount + (count - 1)) * 3;
        m           = [[IETerrainPatchBuffer alloc] initWithCount:tcount
                                                       patchCount:patchCount];
        m.type      = PATCH_L;
        m.typeCount = NL;
        p1.x        = 0;
        p2.x        = 0;
        center.x    = half;
        start.x     = 0;
        for (y = 0; y &lt; count; y++)
        {
            center.y = y * size + half;
            start.y  = y * size;
            for (n = 0; n < ncount; n++)
            {
                p1.y = start.y + (n * nsize);
                p2.y = p1.y + nsize;
                [m AddTriangle:center p2:p2 p3:p1];
            }
            if (y < max)
            {
                center2.x = center.x;
                center2.y = (y + 1) * size + half;
                [m AddTriangle:center p2:center2 p3:p2];
            }
        }
    }
    else if (NR >= count && count != patchCount)
    {
        ncount      = NR / count;
        nsize       = patchCount / NR;
        int tcount  = (count * ncount + (count - 1)) * 3;
        m           = [[IETerrainPatchBuffer alloc] initWithCount:tcount
                                                       patchCount:patchCount];
        m.type      = PATCH_R;
        m.typeCount = NR;
        p1.x        = patchCount;
        p2.x        = patchCount;
        center.x    = patchCount - half;
        start.x     = patchCount;
        for (y = 0; y < count; y++)
        {
            center.y = y * size + half;
            start.y  = y * size;
            for (n = 0; n < ncount; n++)
            {
                p1.y = start.y + (n * nsize);
                p2.y = p1.y + nsize;
                [m AddTriangle:center p2:p1 p3:p2];
            }
            if (y < max)
            {
                center2.x = center.x;
                center2.y = (y + 1) * size + half;
                [m AddTriangle:center p2:p2 p3:center2];
            }
        }
    }
    else if (NT >= count && count != patchCount)
    {
        ncount      = NT / count;
        nsize       = patchCount / NT;
        int tcount  = (count * ncount + (count - 1)) * 3;
        m           = [[IETerrainPatchBuffer alloc] initWithCount:tcount
                                                       patchCount:patchCount];
        m.type      = PATCH_T;
        m.typeCount = NT;
        p1.y        = 0;
        p2.y        = 0;
        center.y    = half;
        start.y     = 0;
        for (x = 0; x < count; x++)
        {
            center.x = x * size + half;
            start.x  = x * size;
            for (n = 0; n < ncount; n++)
            {
                p1.x = start.x + (n * nsize);
                p2.x = p1.x + nsize;
                [m AddTriangle:center p2:p1 p3:p2];
            }
            if (x < max)
            {
                center2.x = (x + 1) * size + half;
                center2.y = center.y;
                [m AddTriangle:center p2:p2 p3:center2];
            }
        }
    }
    else if (NB >= count && count != patchCount)
    {
        ncount      = NB / count;
        nsize       = patchCount / NB;
        int tcount  = (count * ncount + (count - 1)) * 3;
        m           = [[IETerrainPatchBuffer alloc] initWithCount:tcount
                                                       patchCount:patchCount];
        m.type      = PATCH_B;
        m.typeCount = NB;
        p1.y        = patchCount;
        p2.y        = patchCount;
        center.y    = patchCount - half;
        start.y     = patchCount;
        for (x = 0; x < count; x++)
        {
            center.x = x * size + half;
            start.x  = x * size;
            for (n = 0; n < ncount; n++)
            {
                p1.x = start.x + (n * nsize);
                p2.x = p1.x + nsize;
                [m AddTriangle:center p2:p2 p3:p1];
            }
            if (x < max)
            {
                center2.x = (x + 1) * size + half;
                center2.y = center.y;
                [m AddTriangle:center p2:center2 p3:p2];
            }
        }
    }
    if (m)
    {
        m.count = count;
        [patches addObject:m], [m release];
    }
}

 

Terrain Nodes. The last important piece is a simple node class for each chunk of terrain. It does the following:

  1. Store the position of the chunk — its (x, z) grid position on the terrain
  2. Compute and store the static LOD for the chunk
  3. Store the heightfield values from the main raw map data and copy them to the main vertex array (in the main terrain class) each frame
  4. Manage Update and Render functions called from a quadtree in the main terrain class — a node is the leaf in the quadtree

Another class manages the branches for the quadtree and contains pointers to its node leafs. The main sauce in the node class, besides the static LOD calculation, is maintaining pointer to patches and rendering them. The following code does the updating. parent is the main terrain manager class, patch is the pre-computed patch pointers based on the LOD of the node. Zones are the neighboring nodes to the node, and parentZone is the branch to which the node belongs — important for updating the parent zone’s overall bounds for all the nodes it contains.
 

- (void) UpdatePatchLevels
{
    if ([self NoGenerateNeeded]) return;
    if (level == parent.maxLevel || [self NeighborsMatch])
    {
        patches[PATCH_F] = [patch GetBuffer:level type:PATCH_F];
    }
    else
    {
        for (int i = 4; i--;)
        {
            if (zones[i]->level > level)
               patches[i] = [patch GetBuffer:level type:i typeCount:zones[i]->level];
            else
               patches[i] = [patch GetBuffer:level type:i typeCount:level];
        }
        patches[PATCH_P] = [patch GetBuffer:level type:PATCH_P];
        patches[PATCH_F] = nil;
    }
    int mapRealSize     = parent.mapRealSize;
    GLushort* mapData   = parent.mapData;
    int x               = pos.x;
    int y               = pos.y;
    IE_VEC3* v          = &patch->verts[0];
    for (int i = 0; i < numHeights; i++, v++)
    {
        heights[i] = mapData[(int)(x + v->x) + (int)(y + v->z) * mapRealSize];
        if (heights[i] > max.y) max.y = heights[i];
        if (heights[i] < min.y) min.y = heights[i];
    }
    cen.y = (min.y + max.y) * 0.5;
    [parentZone UpdateBounds:self];
}

 

Once updated, a node simply renders its patches, as shown below. Note that the copy to heights is updating a single heightfield array (with the heightfield values for a node’s chunk, per frame) that, in this case, has already been enabled/sent to OpenGL as a single attribute:
 

glEnableVertexAttribArray(ATTRIB_HEIGHT);
glVertexAttribPointer(ATTRIB_HEIGHT, 1, GL_FLOAT, GL_FALSE, sizeof(float), &heights[0]);

 

Normally the “height” — the y value of the vertex — would go in as member of a vertex struct. A separate attribute here is for convenience but appears to have no real impact on performance. Also note that in the first line, the shader is getting the node’s (x, z) position, which is added to each vertex position in the shader — again, no significant impact on performance.
 

- (void) Render:(IETerrainNode*)node
{
    [parent->shader SetVec2:IEU_NODE_POS vector:node.pos];
    memcpy(heights, node->heights, numVerts * sizeof(float));
    int numIndices = 0;
    if (node->patches[PATCH_F])
    {
        numIndices = node->patches[PATCH_F].numIndices;
        glDrawElements(GL_TRIANGLES, numIndices, GL_UNSIGNED_SHORT,
                           node->patches[PATCH_F].indices);
    }
    else
    {
        for (int i = 0; i < PATCH_F; i++)
        {
            if (node->patches[i])
            {
                memcpy(&renderIndices[numIndices], node->patches[i]->indices,
                             node->patches[i].numIndices * sizeof(GLushort));
                numIndices += node->patches[i].numIndices;
            }
        }
        if (numIndices)
        {
            glDrawElements(GL_TRIANGLES, numIndices,
                         GL_UNSIGNED_SHORT, renderIndices);
        }
    }
    numTriangles += numIndices / 3;
}

 

In practice both the iPhone 4 and 3GS can handle a reasonable number of Update and Render calls and still produce solid interactive performance rates (30 fps and higher). For 1024 nodes (chunks), for example, after frustum culling, we’re typically left with a few hundred visible nodes, averaging, say, four memcpy calls per patch (most patches are not full). Adding dynamic LOD can cut the size of the draw calls to some extent, however the savings is not quite significant for all the extra mess that’s needed to manage popping/artifacts (I’ll try to cover this in a future post). Additional occulusion culling — horizon culling for example — will have a greater positive impact on performance.