AI to transform world-building

A starting frame for a platform video game generated by Genie. (Image courtesy Google.)

I cover artificial intelligence at my day job. Every week, I talk to the experts building the technology and deploying it, and to companies already finding value in it. The AI-powered transformation is bigger than anything I’ve ever covered before, in my two-plus decades of technology journalism. And it’s moving faster than anything I’ve covered before. And, unlike some other tech trends, companies are almost universally already seeing value in it.

I’m not going to argue here about whether it’s good or bad — I’m going to save that for another essay. Neither am I going to talk, today at least, about the copyright issues and the job displacements and the potential destruction of civilization. Those are all real concerns, but let’s put a pin in them right now and come back to those later.

Today, I’m going to talk about AI and world building. If you build worlds — or want to get into the world-building business, either as a game designer, artist, or writer, or OpenSim creator — here are three ways generative AI will change everything.

Can AI build worlds?

Generative AI is bad. Often laughably bad. It can’t do hands. It’s attempts at writing code fail most of the time. We all know this, we laugh at it, we roll our eyes at people saying that AI is going to change anything except dupe dumb people into falling for even more stupid political spam.

Except — and this is super important — except that AI is learning continuously and evolving fast.

Let me remind you again how far image-generators came in just one year:

Evolution of Midjourney, from version 1 to version 4. (Images by Maria Korolov via Midjourney.)

That was in 2022. AI was soon winning art competitions, and, most recently, the world’s most prestigious photography award.

In 2023, the same thing happened with text. We went from silly little poems written by ChatGPT to AI writing part of an award-winning sci-fi novel. We point to something bad that AI has generated and pat ourselves on the back for being able to spot it so easily. Yes, we can spot bad AI. But we can’t spot good AI.

This year, we’re seeing the same progression happening with video. Remember Will Smith eating spaghetti?

Here’s today’s state-of-the-art, from OpenAI’s Sora model:

So what’s going to happen next?

First, AI is getting consistent. It’s getting a long-term memory. Early versions of AI couldn’t remember what they did before, so text and images and videos were inconsistent. Characters and backgrounds morphed. Stories went in crazy and contradictory directions. Today’s cutting-edge AIs have context windows of up to 10 million tokens. Yup, Google’s Gemini 1.5 model has been tested to accurately handle up to 10 hours of video or enough text for all of the Harry Potter books, seven times over.

Second, generative AI is going multi-modal. That means it’s combining video, audio, text, and code into a single model. So, for example, it can write the text for a story, create a scene list for it, design visuals using an AI-powered image generation tool, create a video for it, and create audio for it, with the result being an entire coherent movie. Yeah, that’s going to happen. The tech companies already have preliminary models that can do most of this, including that Google AI I just mentioned.

Third — and this is the key part of it — the next generation of generative AIs will be able to simulate the world. OpenAI, said just as much in a research paper released shortly after its Sora announcement: “Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.”

Now, today’s models don’t fully understand physics. They don’t know how glass breaks, the direction of time, or that, say, mass is conserved. We can point at this and laugh and think that these models will never understand these things — just like they don’t understand the concept of human hands.

Well, some of the AIs have become really good at making human hands.

You might think that physics would be a bigger challenge. But Google, the company making Gemini, has all of YouTube to train it on. Plus, all our physics textbooks. And all the rest of human knowledge.

According to the OpenAI paper, developing accurate world simulators is mostly a question of making the models big enough.

From the researchers:

We believe the capabilities Sora has today demonstrate that continued scaling of video models is a promising path towards the development of capable simulators of the physical and digital world, and the objects, animals and people that live within them… We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals and environments from the physical world. These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale.

The authors call this “emerging simulation capabilities” meaning that they appear on their own, without any specific training or interventions. And they list several emerging capabilities, including 3D consistency, long-range coherence and object permanence, and accurate physical interactions.

And it gets better. The authors say that its model is already able to create digital worlds.

Sora is also able to simulate artificial processes–one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”

What does this mean for creators?

Generative AI, like other technologies before it, is a force multiplier. If you can do something, you will be able to do more of it, faster, and, possibly, better.

If you can’t do something, it will give you the ability to do it.

For example, most of us can’t chop down a tree with our bare hands. Give us a knife, and it might take us a while, but we’ll eventually get there. With an axe — we’ll get there faster. With a chainsaw, we can chop down lots of trees. With a swing boom feller buncher you can cut down an entire forest.

I’m not saying that cutting down entire forests is a good thing. Or that you’d want a forest-clearing bulldozer accidentally rolling through your backyard. I’m saying that the technology gives you power to do things that you couldn’t do before.

Yes, we need laws and regulations about cutting down forests, and not letting bulldozers accidentally drive into people’s houses. And yes, these machines did reduce the number of people needed to cut down each tree. I’m not disputing that. All I’m saying is that these machines exist. And if you work in the timber industry, there’s a good chance the company you work with will be using them. And if you’re an individual, you’ll probably still be using your bare hands to pull up tiny saplings in your back yard, or gardening shears to trim bushes, or a chainsaw to cut down full-grown trees.

Similarly, generative AI will dramatically expand the tools available to people who create world for a living. You will still be able to do things the old way, if you want, but the companies you work for — and their customers — will increasingly start demanding them. And, if customers right now are saying things like, “no, never!” tomorrow they’ll be flocking to AI-generated landscapes, AI-powered interactive characters, storylines more intricate than anything possible today.

Future Tools tracks 38 different AI-powered tools for creating video games. TopAI has 70.

Google has released a preview of its own thing, an AI called Genie that automatically generates playable platform games.

Here are just some of the generative AI tools that are on their way, or are already here:

Terrain Generation: AI algorithms can procedurally generate realistic and diverse landscapes, including mountains, rivers, forests, and cities. This can save world builders countless hours of manual terrain sculpting and enable the creation of vast, detailed environments.
3D Asset Creation: Generative AI models can create 3D models, textures, and animations for objects, characters, and creatures. This could greatly expedite the process of populating worlds with diverse and unique assets, from furniture and vehicles to flora and fauna.
NPC Generation: AI can help create non-player characters (NPCs) with unique appearances, personalities, and behaviors. This includes generating realistic dialogue, responsive interactions, and adaptive quest lines. AI-driven NPCs could make worlds feel more alive and immersive. For OpenSim grids, NPCs could provide tours, answer questions, and help populate interactive stories.
Dynamic World Events: AI systems could be used to generate and manage dynamic events within the world, such as weather patterns, natural disasters, economic fluctuations, and political upheavals. This would create a more unpredictable and evolving world that responds to player actions. This could be especially useful for educational grids running simulations.
Procedural Architecture: AI could generate buildings, cities, and entire civilizations procedurally, complete with unique architectural styles, layouts, and decorations. This would enable the rapid creation of diverse and detailed urban environments. I think this could also be useful for building automatic themes for new grid owners. Today, many hosting companies offer starting regions. With generative AI, these regions can be redesigned quickly in different styles. At first, I don’t think this should be done in real-time — the environments will still need human tweaking to be livable. But, over time, the AI-generated stuff will be better and will increasingly be used as-is.
Localization and Accessibility: AI-powered tools could help automate the localization process, translating text, speech, and cultural references to make worlds accessible to a wider audience. AI could also be used to generate subtitles, audio descriptions, and other accessibility features. OpenSim grids have already been using automating translators, for example, with multi-lingual audiences. With generative AI, these tools just keep getting better and faster.

I personally don’t believe that these tools will hurt the video game and virtual world industries. Instead, they will put more power in the hands of designers — making games and worlds more interesting, more immersive, more detailed, more surprising. And bigger. Much, much, much bigger. And it will open up the industry more for indie designers, who’ll be able to produce increasingly more interesting games.

In the long term, at least.

In the short term, there will be disruption. Probably a lot of it. And during these tech disruptions in the past, the jobs lost aren’t the same as the jobs gained — creative jobs, in particular, take time to start paying off.

For example, when newspapers and magazines started laying journalists off after the Internet came along, most journalists found new jobs. Some moved to traditional outlets that were still hiring. Some went into marketing and public relations. A few found new media jobs. And some launched their own publications — they used this Internet thing and launched blogs and podcasts and YouTube channels. A few of them made money at it. But it took years for the new media to gain any respect and credibility and for people working in it to make any money.

In fact, many of the people who made it big in new media were not traditional journalists at all, but new to the field.

Sometimes, people who do things the old way don’t want to change. They don’t think it’s fair that their hard-won skills are no longer as useful. They think that they new ways are lazy or low quality. They might even think that it’s unethical or immoral to do things the new way. That people who, say, cancel their newspaper subscription and get their news online are morally bankrupt and that journalists who enable this are helping to destroy the industry. There are still journalists who feel this way.

We’re probably going to see something similar happening in the age of AI. New tools will pop up putting more power in the hands of more people — power to create art, music, software, video games, even entire books. And you won’t need to spend years learning these skills. Sure, the stuff they create will be bad at first, but will quickly get better as the technology improves, and the skills of people using the tools improve as well. Some of these people will make money at it. Most won’t. But, eventually, best practices will emerge. The sector will gain credibility — money helps. And, eventually, with the exception of a few curmudgeons, we’ll adapt and move on. It will become a non-issue — like, say, using a word processor, or using the Internet, or doing a Zoom call instead of a face-to-face meeting.

Don’t forget that this mix of excitement and apprehension is nothing new. Whenever groundbreaking technologies emerge, they’re met with both enthusiasm and anxiety.

I’m sure there used to be people sitting around a fire saying, “Kids these days. All they want to do is look at cave paintings instead of going out and hunting. Mark my words, these cave paintings will destroy civilization.” Or, “Kids these days. Writing stuff down. In my day, we used to have to memorize odes and sagas. You had to actually use your brain. Mark my words, this writing thing will destroy civilization.” Or, “Kids these days with their fires. Back in my day, we ate our meat raw and were happy about it. Mark my words…”

Yes, there’s a small but non-zero chance that AI will destroy civilization, as was the case with nuclear power, electricity, and even fire.

But I think we’ll get past it, and look back at the curmudgeons fondly, from the safe perspective of a future where we were mostly able to deal with AI’s downsides, and mostly benefit from its upsides.

Things to watch out for

Speaking of downsides, in addition to job losses, there are other potential risks of using generative AI for games and virtual worlds.

They include:

Homogenization of Worlds: If many world builders rely on the same AI tools and datasets, there’s a risk that worlds could start to feel generic or samey. The distinct style and creative fingerprint of individual artists and designers might be lost, leading to a homogenization of virtual environments. On the other hand, we’re already seeing this in OpenSim with the same free starter regions popping up on all the grids, and the same Creative Commons-licensed content showing up in all the grid freebie stores.
Unintended Biases: AI models can inherit biases from their training data, which could lead to the perpetuation of stereotypes or the underrepresentation of certain groups in generated content. This could result in virtual worlds that inadvertently reinforce real-world inequalities and lack diverse representation. On the other hand, AI could also help create greater variations in, say, starter avatars and skins. It all depends on how you use it — but is definitely something to watch out for.
Privacy Issues: In a virtual world, a user’s every interaction with the environment can be recorded and analyzed. Then AI can be used to tailor experiences specifically for each user, creating a more immersive, captivating world. But also — creepy invasion of privacy alert! OpenSim grid owners should be very transparent about what information they collect and how they use it.

OpenSim grids and AI: a plan for action

First, start experimenting with generative AI for the low-hanging fruit: non-vital marketing images, marketing text, social media content, that kind of thing.

Don’t use AI to generate images of what your world looks like in order to deceive people. That will backfire in a big way. Use it to generate logos, icons, generic background illustrations — things that don’t matter to your customers but make your content a little nicer to consume.

Don’t use AI to generate filler text. Use it to turn information into readable content. For example, if you have an announcement, you can take your list of bullet points and turn it into a readable press release. If you’re a non-native-English speaker, turn your ungrammatical scribbles into an engaging, properly written blog post. If you have a video tutorial, turn the transcript into a how-to article for your website — or turn your how-to article into a video script.

Then use AI to turn those useful, informative blog posts, press releases and videos into social media content.

One piece of advice: when creating this content, don’t be generic and impersonal. Add in your personal experience. Show your real face, give your real name, explain how your personal background has led you to this topic. Even as you use AI to improve the quantity and quality of your content, also lean into your human side to ensure that this content actually connects with your audience.

You can also ask ChatGPT, Claude, or your preferred large language model of choice for business and marketing advice. Remember to give it as much information as possible. Tell it what role you want it to play — experienced financial advisor? small business coach? marketing expert? — and provide it with background on yourself and your company, and tell it to ask you questions to get any additional information it needs before giving your advice. Otherwise, it will just make assumptions based on what’s most likely. As the old saying goes, if you assume, you make… and garbage in, garbage out.

Many OpenSim grids have plenty of room for improvement when it comes to business management, marketing, and community building. The AI can help.

Next, start looking for ways that generative AI can improve your core product. Can it help you write scripts and code? Create 3D objects? Create terrains? Generate interactive games? Suggest community-building activities and events? Create in-world interactive avatars?

These capabilities are changing very quickly. I personally stay on top of these things by following a few YouTube channels. My favorites are Matt Wolfe, The AI Advantage, and Matthew Berman.

If you know of any other good sources for up-to-date generative AI news useful for virtual world owners, please let us know in the comments! And are there any specific AI-powered tools that OpenSim grids are using? Inquiring minds want to know!

Source: Hypergrid Business