class: center, middle background-image: url(stingray.jpg) # .white[Lua in the Stingray Game Engine] ### .white[APIs, performance hacks and more...] .white[Niklas Frykholm, Lua Workshop 2015] --- class: center, bottom background-image: url(editor.png) # .white[Stingray] --- # What is Stingray? * Multiplatform 3D game engine * PCs, consoles, mobiles * Built in Stockholm since 2009 * Acquired by Autodesk in 2014 --- class: center, bottom background-image: url(hamilton.jpg) # .white[Hamilton's Great Adventure] --- class: center, bottom background-image: url(krater.jpg) # .white[Krater] --- class: center, bottom background-image: url(vikings.jpg) # .white[War of the Vikings] --- class: center, bottom background-image: url(vermintide.png) # .white[Warhammer: The End Times — Vermintide] --- class: center, bottom background-image: url(magicka2.jpg) # .white[Magicka 2] --- class: center, bottom background-image: url(gauntlet.jpg) # .white[Gauntlet] --- class: center, bottom background-image: url(helldivers.jpg) # .white[Helldivers] --- class: center, bottom background-image: url(tango.png) # .white[Tango] .white[Augmented Reality] --- class: center, bottom background-image: url(clock.png) # .white[Clock Apartment] .white[Home Decoration] --- class: center, bottom background-image: url(trainstation.png) # .white[Train Station] .white[Architectural Visualization] --- class: center, bottom background-image: url(revit.png) # .white[Project Expo] .white[Show Revit models in Stingray] --- ## Programming games Different kinds of code: * "Engine code" -- low level, structured, reused, needs to be fast * "Gameplay code" -- high level, messy, rewritten for each game, quick iterations Writing the gameplay code in C++ is problematic: * Rigid structure * Hard to do experiments * Easy to make mistakes * Long compile times lead to slow iterations * Abstraction leakage from messy code to structured code --- # Our solution
* Game objects: `Car`, `Player`, `Score` only exist in Lua * Language barrier prevents abstraction leakage * We can change C code freely as long as we keep Lua API * All "tricky" decisions can be deferred to Lua code * Data-driven engine --- # Why Lua? * Small, powerful, flexible - Easy to learn - Easy to hack * Fast - LuaJIT * Dynamic - Reloadable - REPL Main alternatives: C#, JavaScript --- # Sample stingray Lua code ```lua function init() self.world = Application.new_world() self.viewport = Application.create_viewport(self.world, "default") self.shading_environment = World.create_shading_environment(self.world, "core/stingray_renderer/environments/midday/midday") self.camera_unit = World.spawn_unit(self.world, "core/units/camera") self.sky = World.spawn_unit(self.world, "core/editor_slave/units/skydome/skydome") director:init(self.world, self.camera_unit) end function update(dt) director:update(dt) self.world:update(dt) if Keyboard.button(Keyboard.button_id("esc")) > 0 then Application.quit() end end ``` --- # Lua where? -- Data files * All gameplay code is written in Lua * Data files are still in JSON * Why? Lua is a great "configuration language" - We actually don't want the full flexibility of a programming language - Because we want editors for artists (not just text editors) --- # Lua where? -- Visual programming * We have a visual programming environment (*Flow*) for non-programmers
--- # Lua where? -- Visual programming * This currently gets compiled to an internal VM representation * Could compile to Lua code instead - Performance? * Programmers can create Lua nodes for used in the visual language - Specify input, output and Lua code --- # Hot reloading * Lua code can be hot-reloaded while the game is running * Reload runs the code again to redefine function implementations * Can experiment with gameplay without recompiling ```lua function Player:jump() self.speed.y = self.speed.y + 5 end ``` * Change the speed and reload ```lua function Player:jump() self.speed.y = self.speed.y + 10 end ``` --- # Reloading class definitions ```lua Player = {} function Player:jump() ... setmetatable(player, Player) ``` * Re-running the code would create a new class named `Player` * Old instances would still use the old class/metatable * We want to change the metatable for existing objects ```lua Player = Player or {} function Player:jump() ... ``` * Now reloading will change the existing `Player` class --- # More issues with reloading * `init()` is not run again for existing instances * Added fields in `init()` will be `nil` for existing classes * Functions stored in variables will not be rebound to new definitions ```lua self.f = self.jump ``` * Gameplay programmers have to be aware of this and write reload-friendly code --- # Implementation of reloading * Custom loader records all files loaded in `package.load_order` * On reload, files are reloaded in that order with `dofile()` * Ensures that load order is consistent * `require()` is ignored on reload, because the files are already in `package.loaded` from first load --- # Binding Lua to C code * We use hand-written bidnings * C++11 lambdas to have documentation, Lua name and implementation in same place ```lua /* @adoc lua @sig stingray.Application.platform() : string @ret string One of the Application object's platform constants, such as [WIN32] or [ANDROID]. @des Identifies the platform the application is currently running on. */ env.add_module_function("Application", "platform", [](lua_State *L) { LuaStack stack(L); stack.push_cstring(platform_name()); return 1; }); ``` --- # Why explicit bindings? * Quality of the API is the most important thing * We want the API to feel like Lua, not like C * We can change the C code without disrupting the Lua API * Can add new parameters while keeping backwards compatibility * Auto-bindings with templates leads to long compile times --- # API documentation * We use an in-house system (adoc) to document the API directly in the code * On-line documentation gets auto extracted * Documentation includes type annotations ```lua /* @adoc lua @sig stingray.Application.platform() : string @ret string One of the Application object's platform constants, such as [WIN32] or [ANDROID]. @des Identifies the platform the application is currently running on. */ env.add_module_function("Application", "platform", [](lua_State *L) ``` --- # Binding C++ objects to Lua * Main goal: performance - Minimize creation of garbage - Minimize use of lookup tables * Ownership - We want deterministic destruction - When a level is destroyed all objects should be too (verified by allocators) - Ownership should reside in C - Explicit create/destroy from Lua --- # Binding options | Method | Comments | | ------ | -------- | | Create new userdata every time object is returned | Creates garbage. Tracking lifetime is tricky with many objects. | | luaL_ref() stored in object | Do references ever get released? C++ objects need to be "Lua aware". What if we have multiple Lua states? | | Weak lookup table based on object pointer | Lookups are potentially expensive. | | lightuserdata | Objects lack metatables. Tricky to track type and lifetime. | * Our choice: lightuserdata + tricks. --- # Missing metatables * Since we don't have metatables we have to call functions explicitly ```lua -- Doesn't work: camera:set_position( Vector3(0,0,0) ) Camera.set_position(camera, Vector3(0,0,0) ) ``` * A bit more cumbersome, but not a big drawback * We don't rely much on "inheritance" * Fits well with method lookup optimization: ```lua local Camera_set_position = Camera.set_position Camera_set_position( Vector3(0,0,0) ) ``` --- # Tracking type with lightuserdata * Every object's memory starts with a type marker ```cpp class Camera { unsigned _marker; public: enum {MARKER = 0x967d35e4}; Camera() : _marker(MARKER) {...} ... ``` * We can test the type of a lightuserdata by examining the marker ```cpp inline void check_type(lua_State *L, Camera *c, int i) { if (!is_object_pointer(c) || *(unsigned *)c != Camera::MARKER) lua_typerror(L, i, "Camera"); } ``` --- # Tracking lifetime with lightuserdata ```cpp Camera.set_position(camera, Vector3(0,0,0)) ``` * What happens if the C object has been destroyed? * Ideally we would want a Lua error, but this is tricky - Objects are "owned" by C++ - No generic "weak reference" mechanism on C++ side - We don't want C++ objects to have to be "Lua aware" ```lua ~Camera() {_marker = 0;} ``` * In *most* cases, accessing a deleted camera will give a Lua type error. * But not guaranteed (memory can be re-used or unmapped) --- # Weak references * Explicit `create()`/`delete()` works for "big" objects but not for short-lived ones - Playing sound instances - Particle effects * Such object are identified by unique IDs in each system - Both from C and Lua - IDs work as weak references - IDs are just ints, no type information ```lua local id = SoundWorld.play(world, "boom") SoundWorld.set_volume(id, 0.9) ``` --- # Weak references for game objects * For certain "important" types (Unit, Entity) we want both weak references and type * We use a lightuserdata with type information in the lower bits
* Since our pointers are 4-byte aligned we can use the lower bits to encode type - 00 - Ordinary pointer (4-byte aligned) - 01 - Unit id - 10 - Entity id * For unit/entity, the remaining 30 bits (on a 32-bit system) encode an ID --- # Temporary objects ```lua local a = 2 * b + Vector3(1,0,0) ``` * Can potentially generate huge amounts of garbage * How can we keep the syntax without generating garbage? * Our solution: - Temporary objects are *temporary* - Only valid in the current frame --- # Implementing temporary objects * On the C++ side, stored in a big ring buffer
Marker
Vector3(0,0,0)
Marker
Vector3(1,0,0)
Marker
Vector3(0.5,0,7,1)
...
* Represented by lightuserdata pointing into this buffer, just as other objects * When we flip frame, we change old objects' markers to a "stale" version * Using a `Vector3` from a previous frame gives a Lua error * To store a `Vector3` between frames you need to box it in a full userdata: ```lua player.speed = Vector3Box(player.speed:unbox() - Vector3(0,0,9.82)*dt) ``` * A bit cumbersome, but big performance boost --- # Operator overloads * Since `Vector3` are lightuserdata they cannot have individal metatables * We implement operators (+, *) by adding them to the shared metatable for lightuserdata ```cpp void LuaEnvironment::set_light_userdata_metatable(const char *module) { lua_pushlightuserdata(L, 0); lua_getfield(L, LUA_REGISTRYINDEX, module); lua_setmetatable(L, -2); lua_pop(L, 1); } ``` --- # Lua performance * Garbage collection * LuaJIT * Multithreading --- # Garbage collection * Games are soft-realtime applications - Stalls from garbage collection is unacceptable - Ideally we want to spend a constant time collecting every frame * GC is terribly unfriendly to the cache * Memory allocation and deallocation is costly * It is easy to write code that generate tons of garbage * Our philosophy: - API should never force garbage on the user - So we use lightuserdata for "almost everything" --- # How expensive is garbage collection? * `\(S\)` memory swept per frame. `\(G\)` garbage generated per frame. `\(F\)` frames required to complete a sweep. `\(M\)` non garbage memory. `\(\alpha\)` acceptable garbage fraction. * To complete a sweep: `\( SF = M + GF \implies S = \frac{M}{F} + G \)` * Acceptable total garbage: `\( GF < \alpha M \implies M > \frac{GF}{\alpha} \)` * Implies: `\( S > \frac{GF}{\alpha F} + G \implies S > \left(1 + \frac{1}{\alpha} \right) G \)` * Cost is proportional to garbage generated. Trying to keep the overhead `\(\alpha\)` low is costly. --- # Driving incremental garbage collection * How much should we sweep per frame? - Want to spend constant ammount sweeping every frame (no spikes) - Sweep too little -- will run out of memory eventually - Sweep too much -- unecessary performance loss * Strategy - Do a fixed number of ms of garbage collection every frame - Expose `global_State::estimate` with a `LUA_GCESTIMATE` query - Estimate `\( \alpha = \frac{totalbytes}{estimate} - 1\)` for current GC cycle - If ` \( \alpha > \alpha_{target} \)` increase GC time proportionally - Force collection if memory is low (consoles have no virtual memory -- hard limit) --- # LuaJIT * Can give huge performance boost * One of the fastest JIT compilers for any language * Problem: - The platforms that most need jitting (consoles, iOS) don't allow it - Not likely to change - For a multiplatform game, code usually has to be written with slowest target in mind - ` \( \implies \) ` Only PC exclusive titles get the benefits of jitting * LuaJIT in interpreter mode is still faster than regular Lua (~30 %) --- # LuaJIT FFI * Allows calling C directly from LuaJIT * Should give a performance boost by bypassing stack manipulation * In our tests, we haven't seen big performance improvements * On platforms without JIT, FFI has *much worse* performance than traditional interface * We have an interest in keeping the platforms as similar as possible - ` \( \implies \) ` No FFI for now --- # Slightly scary: Lua / LuaJIT divergence * Lua is at 5.3 now, LuaJIT at 5.1 * Increasing divergence * Will Lua & LuaJIT separate into two distinct languages? * Is that a problem? --- # Slightly scary: Mike Pall is not BDFL * Has stepped down -- not Benevolent Dictator For Life * Much of LuaJIT's success comes from his excellent work * Future leadership (and quality) of LuaJIT is uncertain * Can be used as-is for many years to come... - ..but may eventually get outdated without clear leadership --- # Scary stuff: Multithreading Lua gameplay code * No obvious really good way of doing it * Will soon become a necessity (Amdahl's law) - Pressure on main thread is increasing * Multithreading gameplay code is hard - Sprawling code, touches all kinds of systems * Since Lua is not multithreaded -- requires separate Lua state on each thread - Increased memory use - How is state synchronized between Lua states? --- # Multithreading models * Shared memory & locking - Too complicated, easy to make mistakes - Requires language support currently missing in Lua * Software transactional memory - Hmmm... * Actors and message passing - Easy model to reason about, fewer mistakes - Algorithms split up over multiple messages become messy * Actors with implicit messsaging --- # Actors with implicit messaging * Code for each actor is written as usual ```lua local my_pos = (Unit.position(unit_a) + Unit.position(unit_b)) / 2 ``` * If you call a function on an actor owned by a different thread - Your coroutine is suspended - Thread continues with updating the next actor * When all updates have completed - Process "foreign" function calls - Resume suspended coroutines --- # Multithreading Lua in Stingray (the future) - Main thread (for simplicity) combined with actor model - Implicit/explicit still undecided - Implicit model allows you to write code "as before" - But maybe it hides too much what is *really* going on --- class: center, middle background-image: url(stingray.jpg) # .white[Q & A] .white[Thanks for listening!]