31 March, 2007

Unstdio

Progress has been continuing, although a little slow the past few weeks. I've had a bit of "coder's block" and instead decided to start taking care of odds and ends: naming the engine, get web hosting, and the mundane task of creating documentation.

The name of the engine is now Unstdio. I've registered www.unstdio.com. I must say, Netfirms and Google Apps are nice and simple, and I was able to put together a few pages quickly enough. I highly recommend that combo for anyone that isn't super web savvy (like myself), but still has content they want to put online.

There's still nothing up for downloading, but I hope to start getting things up in the next couple weeks or so. Hopefully I can learn enough PHP so that I'm not copy/pasting HTML code all over the internet. I've never been one for web pages. :-)

And to give a few updates on new features and changes...

The resource system is just about in place. It's much better now, and resources can now be referenced by name instead of by filename. I've been putting that off for quite a while now for and it's nice to almost be done with it.

I've been slowly wrapping the Newton Game Dynamics library so games can have 2D physics built-in. It will probably be a while before this is completely implemented, but I'm hoping to make it dirt simple to wrap a physics body around a sprite actor and have simple callbacks for collisions and transform updates.

There should now be a decent number of comments in each of the top-level game object classes. I figure it will be a while before I get real documentation, so I'm trying to make this as good as possible for others. For example, here's an excerpt from the GameScene object class:
A GameScene is the heart of gameflow and logic. Examples of simple scenes for a Tetris game might be the main menu, basic playing, and the high score. When the game is told to enter a scene, the #exit method is called on the current scene and then the #enter: method is called for the new scene, with the old scene as an argument. Note: this won't actually happen until the end of the current frame during the handshake.

Each frame, the #advance: method is called (with the delta time - in seconds - since the last frame) as well as the #logic method. The #advance: method is only called when the engine is not paused, and is typically where the scene will advance actor's that are in the scene. However, #logic is called ever frame, regardless of the engine state, and is typically where user input is processed (quit, un/pause, etc). After advancing the frame and processing logic, #render is called by the engine allowing the scene to perform any necessary rendering. Like #logic, this is called regardless of the current engine state.

Once the entire frame has been advanced, processed, and rendered, before any scene changes happen, #handshake is called. This is a catch-all, post-frame method, where the scene can update anything that wasn't taken care of during the frame. For example, if actors were killed, and need to be removed from a list, this would be when that happens. While this isn't a true handshake, the term is being reserved for future use when (and if) the engine is ever multi-threaded, allowing the rendering to happen simultaneously with the simulation.

I'm slowly putting together an idea for a package documentation generator, that will generate HTML documentation for a package and all the classes in it. If something like that already exists, though, I'd be much appreciative if someone can point me to it.

16 March, 2007

Looking Good

I've made a static pool of particles - for emitter actors - that can be recycled instead of being destroyed and recreated over and over again. This has greatly reduced the number of garbage collections that are happening to well within acceptable limits.

The framerate back up to over 400 again with the single effect playing, which consists of an average 160 alpha blended, rotating, scaling, and tinted particles. Much better.

I also took a little time to get the collision detection optimized a bit. Eventually I'd like to get a simple quad tree in the playfields, but right now O(n^2) collision detection isn't much of an issue for an Asteroids clone.

All told, it's been a pretty productive week of optimizations, and I'm back on track to implementing more features that need done.

13 March, 2007

Full Speed Ahead!

When I added the first real particle effect into the game and saw the framerate drop from 480 to 150, I knew something was terribly wrong. Since then, the past few days have consisted of me in optimization mode.

I've since been in contact with Object Arts (awesome support, guys!) and been learning more about their compiler and VM implementation. I've written compilers myself, so this knowledge isn't falling on deaf ears. I'd rather not quote anything said or try and paraphrase comments for fear of stating something out of context. But, here's a quick summary of my optimizations so far....

Part 1: Points

Point objects are evil! Simply put, do not use them for temporary values unless they are only being used rarely - and I do mean rarely. Definitely not something to use in an #advance: method (called every frame for every actor).

I went through my transform class and added a handful of new methods that would not cause new points to be created. Similarly, I went into each method's code and made sure I wasn't calling methods with temporary points, and decided to inline any code that required this.

Last, I added some destructive, loose methods to the Point class. These were for rotating, normalizing, and getting the squared magnitude, without the need for creating a new, temporary Point object. And I then went through all the game code and made the necessary changes needed to support these adjustments.

These changes alone brought the terrible-case framerate from 150 up to about 300. It's great that a few changes could accomplish so much, but I think it's sad that the Point class is poorly implemented. Hopefully in the future Object Arts can make points a primitive type and really improve their performance.

Part 2: ByteArrays

Still, the framerate was not where I wanted it. So, I went through code snippets I wrote, originally thinking they would be obvious optimizations, and confirm that they actually were. These areas were typically ByteArrays that were pre-allocated and used for matrix transformations and similar state settings. After all, a single call to glLoadMatrixf() is much faster than 4 calls to glLoadIdentity(), glTranslatef(), glRotatef(), and glScalef(). Right? Sadly, no. It became obvious very quickly that this wasn't the case. I'll just sit back now and let some sample code and timings do the talking....
transform
"Create a 2D matrix FAST!"

^viewTM
_11: x * scale x;
_12: y * scale y;
_21: y * scale x negated;
_22: x * scale y;
_41: origin x;
_42: origin y;
_43: z;
yourself
The above code should not take longer to execute than OpenGL calls, but it does. I'm sure OpenGL does some very good assembly level optimizations under the hood for matrix creation and multiplication, but the above code is extremely trivial. Opening up a Workspace and doing some simple timings shows some interesting results:
Time microsecondsToRun: [10000 timesRepeat: [
GL
glLoadIdentity;
glTranslatef: 1.0 y: 1.0 z: 1.0;
glRotatef: 40 x: 0 y: 0 z: 1;
glScalef: 1 y: 1 z: 1]].
The above results (on my machine) in a ~2500 microsecond timing. That's pretty damn good for a foreign function interface (FFI). Now, just faking a similar MATRIX setup and slamming it through...
Time microsecondsToRun: [10000 timesRepeat: [
GL
glLoadMatrixf: (b
_11: 2.0; _12: 2.0;
_21: 2.0; _22: 2.0;
_41: 2.0; _42: 2.0;
bytes)]].
This results in ~6500 microsecond. That's more than 2.5 times what's required using just OpenGL! I highly doubt that #glLoadMatrixf: has any significant overhead above any other external method call. But it's possible that there is GP fault checking due to passing a buffer to an external call. Hopefully not.

My assumption at the moment is that there is a good amount of VM overhead for #floatAtOffset:put: in the ByteArray class - most likely due to bounds checking and type coercion. If this is the case, hopefully I can convice Object Arts to add a kind of UnsafeByteArray (or at least some "unsafe" methods to ByteArray).

Given the above information, I made the code change over to just make straight OpenGL calls. I also created a GameColor class instead of using a ByteArray for calls to glColor3fv().

These changes netted another significant win and the same particle effect (with about 200 particles) in game is now running at a solid 350 FPS. That's about a 200 FPS gain since my last post. Not too shabby, but still more room for improvement.

Part 3: What's next?


I'm going to continue correspondence with Object Arts regarding these issues. The more I learn about the inner workings of Dolphin the better I'll be able to push it for the benefit of others. Likewise, hopefully I can convince them of the need for certain functionality and it'll be a win-win senario.

Beyond that, I'm in the process of trying to create a fixed-size pool implementation for particles. Garbage collections on individual particles is still happening far too often with how quickly particles are created and destroyed. Giving each emitter a pool of 200 or so particles that are never released until the emitter dies should yield another decent gain.

12 March, 2007

Weekend Followup

I've been getting a lot of emails about my last post, particularly in regards to Dolphin's garbage collection (some of them generalized this to Smalltalk, which would be wrong), and some regarding garbage collection in games period. Instead of answering each of them or replying to blog comments in a comment, I decided to follow-up with another post....

In a garbage collected language, collection times are a reality. A 30 ms spike isn't horrible (I cringed as I typed that) as long as it's only once every 10-15 minutes. Right now, the collections are occurring roughly every couple minutes. This is unacceptable.

The only real choices are to either turn the collector off and force collections at known "good" times (which for some games isn't an option), write code in a manner that severely limits how often they are needed, or a combination of both. Either way, eventually the memory manager will need to sweep over a large set of objects and see what can be collected. And, while optimizations can be made, this is a problem that will always exist.

Turning off the collector isn't really an option [for me]. I can't make that generalization for all games created with the engine, and I can't generalize when a collection should happen. An individual game can choose to do this, but I won't be forcing that down anyone's throat. But, I do need to reduce how often a collection is needed. This will require special collection classes (pools), some intimate knowledge of the VM, and a certain amount of black-box trust. Sadly, "black-box trust" is something I keep in short supply these days.

Currently, I don't think lots of particles are causing the majority of the GC's - they're just shining a bright light on the real problem(s). My gut tells me that it's the VM not properly handling short-lived objects well. Particle updating currently uses a lot of Point objects for temporary calculations. The compiler probably doesn't recognize that they can't possibly be referenced outside the scope of the method, and therefore doesn't optimize their creation and deletion.

I need to ping Object Arts before making too many assumptions about the internals, though. I don't know if Dolphin uses mark and sweep, multi-generational garbage collection, simple reference counting, something else, or a combination approach. I could be pretty far off base. Be assured I will post a very detailed analysis of my findings and what I'm doing to take care of the problem.

11 March, 2007

Weekend Developments

I got quite a bit of work done this weekend. Animated sprites are working. They currently come in three flavors: forward, reverse, and ping-pong. They can loop forever or when done the animated actor will simply remove itself from the scene. This is useful for short-lived, sprite actors, and I'm now using them to add some nice explosions to Asteroids.

After adding animated sprites I worked on getting particles rendering. I decided to go the full-blown route here and really implement particle emitters. A particle system describes how the particles will behave. A particle emitter [actor] decides where and when the particles will be emitted, and the individual particles are finally advanced and rendered.

While there is a little bit of code cleanup left to do, I'm overall very happy with the results. I have added wormhole objects to the game, randomly appearing for 20-30 seconds, attracting the player and asteroids, and gaining intensity with each object that it swallows. A very cool effect.

I've cleaned up the OpenGLPresenter and OpenGLView objects, and they can now be used in the View Composer very easily to render any data desired. This is actually the first step of many to add some very slick game development tools to Dolphin (more on this at a later date), but in the meantime perhaps they will be useful to others as well.

On the downside, I'm finally starting to run into a few barriers that it looks like I'm going to have to address myself.

First up are the collection classes. I've had to create an UnorderedCollection class that will allow for O(1) removal of elements. It works very well, but it frustrates me to have to subclass a built-in class just to override a single function and implement it more efficiently. I rather wish something like this was already in there. I would have prefered to just add a #removeUnsortedAtIndex: loose method, but then #removeAll and similar methods wouldn't have worked as expected. I'm open to suggestions on this.

Next is the Point class. While I'm sure it works very well for 99.9% of all Dolphin users, it's extremely inefficient for my purposes. Running Ian's profiler reveals that almost 40% of all time is currently spent there. This means I'm going to have to go through a whole ton of code and adjust methods to take x:y: parameters instead. I'm also going to have to alter some of the code in the Point class to improve performance and add functionality (#rSquared and #normalized for use with intermediate results and #normalizedFast for not-quite-exact but really fast unit vectors).

Finally, with the particle system in and running, I've decided to really crank up the rate at which actors (and particles) are created and destroyed to test one other major concern: garbage collection. My own tests reveal that when a GC takes place, it usually takes between 30 and 40 milliseconds to run. This is an enormous amount of time. As progress continues, I'll be sure to post my findings.

07 March, 2007

Extending and Simplifying

Over the past week I've been slowly putting together a rather long post about all the changes that have been made. Instead of trying to make points or talk about experiences, I think this time around I'll just give the updates and explain a couple decisions made along the way.

I'll start with the biggest update: the game engine is now using OpenGL. I still have all my Direct3D9 wrappers (and anyone is welcome to them), but OpenGL offered [for this project] some extremely nice benefits over Direct3D. I'll get to those in a minute. But, for now, know that Dolphin now has a 100% implementation of OpenGL out there, with extension support.

I've also decided to implement 2/3's of an MVP triad (I'm still trying to wrap my head around MVP): the view and the presenter. This not only simplified my code a ton, but I imagine than an OpenGLPresenter would be extremely useful for others in the Dolphin community as well. Probably the smallest example I can give of it in action would be:
p := OpenGLPresenter show.

p makeCurrent
ifTrue: [(OpenGLLibrary default)
glClear: GL_COLOR_BUFFER_BIT].
p flip.
Simple enough. There should now be a rather large OpenGL rendering window on the desktop with nothing in it. The presenter can be attached to any view object (within reason I suppose), which means it should be very easy to use in dialog applications or anywhere else that rendering is needed.

Since the ExternalLibrary object uses its own #getProcAddress: for looking up external functions, this made creating OpenGL extensions very easy. Subclassed off of OpenGLLibrary is OpenGLExtension, which overrides the #getProcAddress: method and replaces it with wglGetProcAddress(). Also, it has a class method: #find. To use an OpenGL extension, just subclass the extension class, and then just start defining the instance methods as normal, just as if the extension exists. The #find method will search the extensions available against the class name and return it if found, otherwise nil.
"Declaring the extension class..."
OpenGLExtension subclass: #ARB_multitexture

"Defining one of the methods in the extension..."
glMultiTexCoord2fvARB: mode coords: v
<stdcall: void glMultiTexCoord2fvARB dword float*>
^self invalidCall

"Using the extension..."
ARB_multitexture ifFound: [:ext |
ext glMultiTexCoord2fvARB: GL_TEXTURE0_ARB coords: ptr].
That's it! Extensions in OpenGL have never been easier (in my experience).

Okay, so why the switch? Well, there were a few different reasons. The primary one is that all games are different. Sure, I'm making a 2D engine, but there are many flavors of 2D games. Each with its own set of requirements and rendering needs. And there's no way I could (or would want to) anticipate all of them.

With Direct3D, there were two big problems. First, D3D is handled entirely through a single interface object (the IDirect3DDevice9). This means that if I (or someone else) down the road wanted to do any special sort of rendering, it would require that I make the device available to everyone. And that brings me to the second problem: D3D state management can be a nightmare. Simply put, I couldn't trust external code to properly manage the device and the state. If the device were to get in a mucked state, it could cause serious problems (including image corruption).

Using OpenGL, not only is rendering easier and well understood by most hobby game developers, state is managed through stacks. Allowing outside code to push transforms, enable client states, render, then pop the state back to what it was is trivial. The engine won't be making use of client states or vertex arrays - just display lists and primitive OpenGL functionality. This means future games aren't limited by what I do now.

Asteroids does some of its own rendering (stars and thrust particles).

This allowed me to completely rid the engine of the GameRenderer class and a whole host of code that was just a waste of space. It always feels good when extending functionality actually reduces the overall code and what needs to be maintained.

Another benefit that I'll take a minute to mention was shown above in the OpenGLPresenter example. That is, while the game is running I can send OpenGL commands to the game to test ideas from a workspace. This is a wonderful debugging tool to have at my disposal, and I use it constantly to test state, get error codes, or even try out different rendering techniques.

Okay, so what else? It's been a while since my last progress update.

I'm still using DirectInput, and I have joysticks and gamepad controllers working. I've ported over the XINPUT libraries (for Xbox 360 controllers), but have no desire to really get those working in the engine. The interfaces are easy, but significantly different from DirectInput, so I'll just put that on the back burner for now.

Scenes are now composed of Playfield objects. A playfield is basically an Actor manager. A typical scene may have lots of playfields. In fact, the more the better, as they help to manage more than just what's being moved around and rendered. In the asteroids clone, there is a playfield just for asteroids, one for the player and the bullets fired, and one for particles and other short-lived actors. This separation of actors helps to manage z-ordering, collision detection, and more.

Another benefit to using playfields is that actors can now "kill" themselves. This would be akin to removing a renderable object from a scene graph. This is a huge burden off the game developer since actors can be responsible for their entire life cycle. The Asteroid>>explode method takes care of killing itself, as well as spawning new asteroids on the same playfield. And it just works, which is always a good thing.

Oh, wait! I almost forgot another great addition!

With the loss of Direct3D, I lost the D3DX libraries (which I didn't want to use anyway). This meant that texture loading was back to just plain, old bitmaps. But, I decided to code up some nice wrappers for the DevIL project. It's 95% complete. All that's missing are the D3D8, Allegro, and SDL functions, which were intentionally left out. But all 3 libraries (IL, ILU, and ILUT) are implemented, along with a helper object - DevILImage - that is useful for managing images loaded with DevIL and manipulating them.

The engine actually only uses the lowest level of the 3 libraries: IL. But the others are there if anyone would like to make use of them. Thanks to DevIL, the engine now supports an entire host of image formats. Also, Direct3D and OpenGL support for DevIL was pulled out into their own packages which just add loose methods to the ILUT library. This was done so that if you didn't need it, no need to install them.

I'm starting to move onto actual interface code now. I've also started playing with Seaside in Squeak a little, as I've done enough that I think getting some web hosting going so I can start publishing packages for others to use and help test. I'm a terrible web developer, but Seaside looks pretty neat and fun. If anyone can suggest some very reliable web hosting services, definitely let me know. Eventually I'd like to use Seaside hosting, but I think it will be a while before I'm ready for that.

Until the next update...