Generative AI - A Technical Resource for Content Creatives
A deep technical overview of AI driven creation tools for the non-technical audience.
Introduction
How to use this article
The intent of this article is to provide a substantial, but broadly accessible overview of generative content tools intended for anyone in the segment of the creative arts and enterprise world who didn’t come from an engineering or broader technical background. This is a large slice of people, and one that isn’t well catered for by most existing technical material.
This is intended to be an overview that content creatives can refer back to, and a touchstone for some additional, much more easily digestible articles. There is a lot here to take in, especially in a single sitting, but I will keep the focus on things that are specifically relevant to creative applications, and also keep this all as technically straightforwards as possible. Simply put, if it is in here, I think it has a good chance of being immediately important to real creatives and their workflows. You can find a full hyperlinked table of contents HERE.
I’m going to start with a focused technical overview, and then start to examine more specific examples of how technical details actually translate to practical nuance that is applicable to creative tasks. I have also split the discussion points up by relevence - you can check the table of contents for details.
What you won’t find here
This article will address apects of generative AI that impact their creative potential, but broader discussion of what we know about creativity within these systems, and their eventual likely creative potential will be covered in a separate article. I’ll link to it from here when it is complete.
Neither will this article substantially address topics such as the ethics, commercial and social risks, and legality of AI generated content. But you can find additional material on those topics, as well as range of other resources and relevant content at my broader generative content hub HERE.
What do you need to know about the technology itself?
LLMs
It’s important to understand, that the systems we tend to refer to as LLMs are actually technically distinct chatbot processes. e.g ChatGPT.
LLMs actually being the primary “model” that the chatbot relies on in order to work. The distinction here is subtle, but important. ChatGPT isn’t an LLM, but it encompasses an LLM, and that LLM is fundamental with how you interface with it.
These chatbot systems are using a process termed “next token prediction”.
They don’t actively consider a problem that is being solved, they tap into a massive “map” of the relationships between language and concepts (The LLM), and then use that to make an estimation of which token they should respond with next in a sequence, and then, token by token, that turns into a conversation or a plan or a document. Think of it like sticking a tap into a giant pre-existing brain, rather than having a conversation with one.
The tokens in question are words, or fragments of words. The chatbot internalises a string of text and and then consults its LLM map to predict what comes next. It doesn’t “think” about that, it just supplies the next token.
The Pen is mightier than the …..
Did you actively think about that? Consider the philosophy and meaning of the statement? Or did a part of your brain just supply the missing word? Even if you did pause to consider it, notice how the word probably arrived before the consideration did. That latter process is effectively what the chatbot is doing.
There are a lot of people right now who are more confident than they should be that we fully understand all of this. In a very real sense this is a technology that started talking unexpectedly once it got complex enough, and it’s not a bad idea to approach it with that in mind.
But LLMs are different from us. That computer isn’t thinking as you understand it, because there really isn’t anywhere for that to be happening in the moment. This is functionally a very complicated equation that predicts the appropriate next word to use in a given situation so accurately that you can get those words to reason. The result is basically a stream of unconsciousness (1). If you are the kind of person that worries extravagantly about the future of things, This LLM consultation process itself probably shouldn’t concern you, and you should maybe try not to notice how many qualifications there are in this paragraph.
In a conversational chatbot like ChatGPT, the string of tokens it is responding to is the text you are giving it whenever you ask it to do something, which is known as the “prompt”. There will likely be more structure behind the scenes to break your prompt down first, and help the model “plan” how to respond before it actually replies, but the process itself is ultimately the same. They will all have a system level “base” prompt that cue them to the role they need to play, and instruct them on how to interact with you, which generally means acting in a very helpful way, but also maximising your engagement with them, keeping the conversation going.
They will have a set amount of working memory, known as the “context”, available to record the salient details of a task, including the conversation itself, and often as additional space where it can generate text to “consider” the best way to fulfil your request, and store information retrieved by sub-processes (2). The size of this allocated space and how effectively they use it can also be very important. A user will typically not have full visibility into the context, especially with regard to any base level prompts, but its important to understand that any specific piece of information can not be considered by the token process unless it exists within either the context or the training data. And at a practical level only reliably considered if it is within the context itself.
There is no place for continuity of experience here. Each token is essentially generated in a separate distinct process, and you can change the underlying LLM completely at each step. The only thing that would change otherwise is the addition of the new token, or any additional information that has been injected into the context between steps.
A lot can happen within a single token step. In fact a lot has to be able to happen in order for the chatbot to engage with multi-step logical problems. Either all the reasoning must be done within that single step, or it must be separated into individual reasoning steps which much be introduced into the context somewhere. This capacity for multi-step reasoning within a single token is logically very important to performance across models, but the specifics of that are not well understood.
This “quantised” reasoning has other consequences. An LLM can’t go backwards and forwards in a text to check who said what. It isn’t parsing text in sequence. It can’t leave, and check information elsewhere. If it is provided with a link to a separate resource somewhere, it can’t follow it to read the reference, it has to use a completely separate service to inject (a usually very focused and abbreviated) version of that directly into the context. The same is true for other services that allow the chatbot to connect to other things. It can’t see an image, or watch a video, but it can retrieve information about those things using similar tools. All these steps can introduce error or inefficiency. All these steps can be budgeted for cost.
The chatbot process effectively live within a box, walled in by abstractions, and it only exists for a token at a time.
This also means that its reasoning in the moment can be surprisingly stable and predictable. After all, unless the model is changed, the equation at each new step with regard to available data is often (Internalised Training Data + Context) + 1 Token. An infintessimal shift. And when similar tasks from different users are tapping the same model, that means that this will often result in very similar outcomes. This is in contrast to a conversation with a human, who might make a logical leap, or realisation mid conversation, and completely change direction. Talk to a different human, and they will tend to be drawing on different information and experiences.
This stability tend to result in a lot of predictability issues with LLM output, which is extremely consequential when trying to use them for creative tasks. No matter how creatively capable the system will technically be, its output will often fall into patterns regardless. Effectively a kind of creative “diffraction pattern”. Even if this is just driven by unoriginality in the directives offered by humans. These patterns can easily become self reinforcing, if a lot of advisory or educational content is itself being generated by LLM systems.
We can force the process to be less predictable, by injecting randomness into the system directly, increasing what is known as the “temperature” of the response, effectively forcing the system to periodically provide less “correct” tokens.
One persistent issue with LLMs is their tendency to “Hallucinate” plausible looking false information when no accurate information can be provided. This is obviously most critical when these systems are being relied on to create factual information, but it impacts creative tasks too. These hallucinations can result in errors being introduced into the context itself. They can cause a system to lose track of what’s happening in a story, ( e.g. assigning an action to the wrong character), they can stimulate the system to mix up content in the task with similar content in its training data.
Temperature is very closely linked to hallucination, for obvious reasons, but it isn’t the only cause of that and we can’t completely avoid accuracy issues by turning the temperature of the system all the way down. That’s because how the model responds to those “mistakes” is also important. Turning down the temperature down too far can reduce the systems ability to respond to errors or reconcile logical inconsistencies, and could even cause it to get stuck in a loop (3). Imagine you blurted out the wrong word in a conversation? How might you respond to that? The temperature setting can’t protect it from drawing on bad data, but it can certainly influence its ability to respond flexibly to it. There are a LOT of reasons that hallucinations and other factual issues occur (4).
Creative tasks may be performed better at a higher temperature, but that can have an obvious knock on impact on tendency to hallucinate or, especially to maintain consistency within a media project. And many users will tend to use generalist models which will be tuned for factual accuracy first, and creativity second. A software company knows that it is much more likely to get into legal trouble if its model gives bad medical advice, than if its model calls 100% of its protagonists “Susan”. This can sharply impact creatives when they experiment with these systems, because few people start their LLM experience by paying for expensive, or less prominent, specialised models (or would realise they might need to).
The other thing the chatbot process lacks is an obvious source of agency or motive to explore an idea, or innovate on its own. They are almost invariably acting in response to a directive, and that will invariably tend to constrain its output. I’m going to discuss creativity within these systems in a separate article, but this lack of agency is specifically important to keep in mind, because it means that one of the big missing pieces in generative creativity, is simply finding additional ways to direct the process in unexpected directions. That is, to insert enough (non-heat) entropy into the process to bounce it out of its creative “ruts”.
So that’s the chatbot token process itself. What about the LLM?
This LLM is effectively a multi-dimensional “map” built on both a large quantity of raw “training data”. This is sourced very broadly, and includes everything from raw data, written literature, technical knowledge, educational material, online conversations, and documentation. If it contains knowledge or examples of reasoning, then there is a good chance it is being used to build LLMs. This may also be supplemented with synthetic data created by other AI systems.
Aside from raw information, this training data, also includes examples of human reasoning, rhetoric, and logic, it also incorporates a lot of different sociolinguistic patterns and material in different languages and formats. These are just as critical to the model as raw data. The sources for this training data can be wildly diverse, and will also contain inaccurate information and conflicting arguments. How the model reconciles those things can be steered by a lot of different factors, as we will proceed to discuss.
The existence of these various language features within the model is important for creatives to understand. They may be a useful tool in injecting creative entropy into an LLM process and reducing predictability, but they might also have potential in helping the model form novel connections across disparate concepts, and so leverage the sheer depth of the internalised resources that exist within the model.
It’s also important to understand that the process of turning that data into the LLM map, necessarily involves the loss of a lot of that information and those various language features.
Identifying specifically what remains until it has been demonstrably leveraged is very difficult, likewise identifying what is being drawn on in any specific response. This is why we can’t rely on an LLM to leverage information that isn’t placed directly within its context, even if we can be confident it would have been included within the original training data.
It’s good to remember that these are very alien systems that seem superficially familiar to us.
There is a lot that we don’t understand about them yet, but also a lot that we do, that isn’t yet broadly understood or leveraged by most individual users. This means that building your own understanding of these nuances can offer a powerful personal advantage in using them effectively.
It’s also good to remember that these systems are very much in flux, and the specifics of what is happening within them are often deliberately obfuscated for commercial and safety reasons.
Generative Image, Audio and Video models
Let’s start with the visual systems. Images and Video.
There is probably a broader range of models and systems to consider here than with LLMs, but at the most abstract level they are all working the same way. A system (model) is “trained” to recognise features of images, simply by providing a lot of (initially labelled) sample images.
Like with the LLM, the model itself is effectively a map that is being created and refined: in this case a multidimensional map of visual relationships.
Here, this is supplemented by the labelling information, allowing the system to link visual features to descriptive language, which is necessary if we want the system to eventually respond to language based prompts. This need for adequate labelling of the visual information is an important challenge that has to be adressed in creating these visual models.
Once the system is capable of recognising features within its labelled sample images, it becomes possible for the system to start leveraging those relationships to start making observations or predictions about (classifying) unlabelled images, and then to recreate training images that have been transformed with the injection of random information. Once the system can do that effectively enough, it can start to generate completely new images, simply by starting from a noisy base, or by adjusting an existing image so it better matches a defined set of features.
A system that can do that for static images, can then create a video, simply by adjusting an image for each new frame.
At least in theory.
What’s missing from this discussion is a whole layer of interpretation and orchestration processes, that aim to translate the prompts from a user into a form that can be used by the visual model, as well as facilitate the production of a complete piece of media.
This can involve anything from accurately parsing and flagging negative vs positive descriptive elements within a prompt, helping to maintain continuity within a sequence of images or video, to artificially introducing transitions into a video sequence that would otherwise be a single evolution of consecutive frames.
This orchestration layer will often involve other AI systems, including LLM based systems, complicating consideration of where any reasoning and world models are actually occurring within these systems, and introducing additional layers of abstraction into the process.
Effectively this means that these visual systems are getting a lot of “help” from other AI processes, but that help can sometimes obscure the depth of contextual information that the image model itself is missing.
The orchestrations systems also tend to be poorly documented and described, simply because of the immense commercial value of innovation here, with any advancements being valuable trade secrets. The net result is these aspects of the systems are often glossed over in documentation and teaching material, but they are absolutely key to the functionality of the systems as a whole.
Audio media and voice synthesis is effectively done the same way. A map of features is made, and those same maps then used to classify new material, and then to enable creation of media from first principles. The specific features being mapped differ here, as do the orchestration processes involved, but the principle is the same.
It’s important to note that advancement across generative media, may well come in the form of multimodal systems that bring those diverse feature maps together within a single multi-modal model (A LMM), but the aforementioned commercial pressures means that visibility into progress here within the marketplace is likely to be obscured (5).
All of these systems currently have blind spots. Bringing them together will be technically challenging, but could be transformational in regard to their abilities and capability. There are obvious transitional approaches here that are already being used by current systems, such as finding ways to e.g. tokenise visual data, to introduce it into a language model system, that might not represent the full future potential of the concept.
I think it’s important here to stress again the distinction between these models and some of the processes we describe using these terms.
e.g. ChatGPT is a chatbot, not an LLM. The use of an LLM to predict the next token in the conversation is fundamental to your interaction with it, but it has a broad range of distinct language models it can draw on to do that, as well as other non-LLM systems (including image models) that it can use to help it fulfil your requests. If you plugged that chatbot into a multi-model model instead, but continued to use that for a token prediction task, it would still need a piece of video media to be tokenised and introduced into the context to reliably leverage it within reasoning, and it would still need to re-access it for each subsequent token. Its tokenised form would still be an abstracted description of the media from the viewpoint of the chatbot.
But token prediction is not the only thing an LLM can be used to do, and a chatbot is not the only system it can form a component of. A fully (or even partly) Multi-Modal Model might have advantages, but is not required in order to create Multi-Modal Systems.
That LLMs will not, in themselves give us an AGI is a very common talking point.
On a technical level this is a very defensible statement. But it’s far from clear yet just how close their existence might have brought us to one, and it’s important to understand that too.
Creativity in LLM systems?
I’m going to take a more detailed look at some specific nuances and aspects of this technology that I think are very relevant to content creatives. But there is also a specific technical question looming over this space that I’m going to mostly skip over for now:
An AI systems be creative? And what do we currently understand about creativity within AI systems?
I am still going to talk about more functional creative limitations in this article, I’m just going to address the broader topic of creativity within these systems in a separate article.
Intro to discussion points
I’m splitting this up as follows
Things that will directly affect the quality of what you make
This is where the highest-value practical insights live. The stuff that might change how you approach your next creative project.
Understanding the limits of LLM judgement and feedback
These points address the limitations of LLMs as reliable critics or collaborators.
Risks specific to visual media generation
If you are only using LLMs for creative work you can probably skip this section
IP, copyright, and commercial risk
Key legal points to be particularly aware of
Platform, cost, and safety considerations
What to consider when choosing a platform partner
Things that will directly affect the quality of what you make
Predictability issues within these systems are hugely consequential to creative tasks. Especially commercial creative tasks. This can affect all aspects of output.
Because of the partially deterministic nature of the LLM process, and because of how the same models are used by many different users, it’s very easy for creative output from these systems to fall into patterns, and these can extend broadly and unpredictably throughout every single element of the content. Everything from ideation, character names, setting details, structuring of content, physical descriptions or other image and design qualities, plot elements, dialogue, special effects, scene transitions.
The list is endless, and your practical ability to check for those things isn’t. It’s very hard to see these patterns without creating a lot of content, and many of these patterns are mechanically hard to check for. Even where patterns aren’t visible now, they are likely to become visible later. Research shows that LLM content exhibits reduced diversity of ideas, and also that these influences are hard to overcome (55).
There are no shortage of well-noted specific AI content “giveaways”. However another important tell-tale is often simply the sameness and predictability of AI output.
Where content is intended for commercial usage, this becomes even more of a liability due to the importance of search (or broader algorithmic) performance to the success of many types of online content.
Simply put - duplication is anathema to search algorithms, and those algorithms are much better than humans at spotting that duplication. This is entirely reasonable. Why would an algorithm want to serve up your content to its own audience, if that content offers them nothing new or distinctive?
There is a lot of focus on Google’s EEAT (Experience, Expertise, Authoritativeness, Trust) guidelines (53) when considering the performance of AI generated content, but I’d argue that distinctiveness is a very important practical signal in itself, just because it can so clearly signal that a content piece will not meet those standards.
And so the predictability of generative content means that it can be exceedingly difficult to get these tools to output anything that is legitimately new or distinctive.
Not impossible. But not without significant additional work. One of the perils of lazily generated AI content is that much of it is entirely worthless for its intended purpose. Especially as commercially meaningful online content.
Creating legitimately valuable content with AI usually requires significant legitimate human creative input, but also the injection of as much practical entropy into the process as possible.
Unfortunately, an additional problem here is the potential of some of these tools to add fingerprints of AI involvement to what would otherwise be entirely human sourced and created media. This can be as straightforward as using an AI tool to re-write parts of a content piece, or as subtle as internalising language patterns from the AI systems that you are interacting with (52).
There is also a potential for these patterns to reinforce into a preference against human sourced material within LLMs (60)
Language models can respond most effectively to information that is directly present within their context. The tools that they used to tap external resources have limitations.
If you want a language model to properly evaluate something, then you are almost always better off copying it directly into the conversation, rather than linking to it and relying on an imperfect summarisation (6). Apart from all else, language models can be prone to not following links and instead hallucinating what they expect to find by following them (7).
LLMs are built from language, and heavily influenced by how, when, and how often we use it. Language and language features can impact performance, but also influence the training sources that the model uses to perform the task.
That their ability to understand multiple languages seems miraculous to a human can obscure the fact that their performance across different languages might be very inconsistent. A need to rely on LLMs can seriously disadvantage non-English speakers (15). Their ability to perform creative language tasks may also vary significantly across languages, and their use might be impairing the distinctive features of work produced by global creators (57, 58).
Similar issues can also be expected to apply to other aspects of language such as dialects, vocabulary level, work history, and all of the other subtle factors that influence how we use language (16).
These factors can also change the training data that an LLM uses, and, almost certainly the moral framework and patterns of reasoning that it will engage. Remember that an LLM is a map of the relationships between language and concepts, not a set of weighed opinions, or a corpus of knowledge.
An LLM doesn’t have a set opinion on anything, even on very foundational or controversial topics. It may have a bias towards solid logic and effective reasoning, but nothing within it is internally comparing arguments or making decisions. Rather it encompasses a multiplicity of reasoning examples, and so the specific language features within (or attached to) a prompt can easily steer the model towards the arguments most associated with that pattern of expression.
Both education level and age have been identified as particularly strong influencers, and shown to explicitly influence the value framework for responses (17). Sociolinguistics in general is likely to be important to many aspects of LLM behaviour, but also relatively under explored in the literature. (18).
The “attached to” point is important as well. This means even secondary sources and references, or any other text could potentially have an invisible impact on reasoning. This extends to account level context features, or any other shared information. If you are using your ChatGPT for creative work, that could conceivably have an impact on its performance on a factual task, likewise asking a factual conversation in the middle of a highly creative conversation. Researchers have observed that “debiasing” text by removing distinctive language features can impair creativity (19)
We can also expect LLMs to have gaps in their reasoning ability or knowledge where there are gaps in conversation, or where they don’t have access to the required data. If something is too obvious for people to talk about, not only do LLMs lack any obvious source of knowledge regarding those concepts, they will be less able to draw on them when “reasoning”.
Even if the training data contains the needed information somewhere else, the disconnect will logically impair the models ability to actually leverage those concepts where any understanding of them might be most needed.
Consider this in terms of building a bridge. It doesn’t matter how many keystones, you happen to have on hand, if you haven’t actually put any of them in the right place.
The specifics of these knowledge gaps aren’t currently well understood, but exploring them, as well as how LLMs respond to them is a focus for research, with researchers examining LLM awareness of foundational communicative information, such as Grice’s Maxims (20), and probing missing knowledge in language models more broadly (21), as well as our understanding of LLM world models (22)
Language based systems can start to make mistakes over long interactions or projects. They may perform inconsistently over the course of the task. Or “forget” important details.
In particular, they can quickly struggle when presented with multiple revisions of the same work, or variations on the same task in the same conversation, or when performing the same task repeatedly. If they are set up with a prompt to perform a specific task, and then repeatedly given work to execute it on, their performance is likely to change over time. It’s not just a function of the amount of the working memory context space available to the system, its a function of the information becoming increasingly unwieldy for the system to work with. They can lose track of who said what. Details get confused, and small inconsistencies start to snowball (36). Hallucinations become more frequent (37).
Just the general back and forward of a conversation can reduce performance, and this might not be well represented in testing, due to the challenge of creating test protocols that effectively simulate a spontaneous conversation (38).
This becomes a major problem when creating extended written projects.
LLMs do not track entities the same way that humans do.
Humans will tend to track e.g. characters within a story according to their appearance within the structure of the text, or their own personal knowledge. LLMs seem to be much more reliant on the features of that “entity” and how they correspond to the LLMs own conceptual map (39).
So an LLMs understanding of a specific person is probably more in tune with literary concepts like the “Archetype” Certainly they are very good at of forming these kinds of associations (40).
As already stated, this means that it’s very easy for the model to confuse entities with similar figures that exist within their training data. And when confusion does occur it can tend to be intractable.
This may explain why LLMs are very good at spotting, developing and advising on themes and archetypes within a story or project. This also means that they can be quite prone to reinforcing and intensifying archetypal behaviour, even if this is not the intent of the author, including problematic or dated narrative tropes (41).
The iterative process by which language model systems work, means they aren’t generally structuring work as effectively as they could be on the first pass.
Performance can often be sharply improved simply by allowing a second pass on a project or task.
Breaking problems down into steps can often also dramatically improve performance. As can prompting the system to review the reliability of its own work, suggest ways to better structure it, or identify any missing information.
Just be aware that a LLM that has been prompted to find an issue generally will, even if the project has already passed the point that it could offer useful feedback.
Performance can often be improved by suggesting specific creative options, but these can’t easily be overlooked once introduced.
Even when multiple suggestions are offered, both LLMs and image models will tend to make the same choice of which to actually use consistently.
And language given in specifically directed instructions can often have a much broader impact on outputs than intended. e.g. An instruction on costume palette bleeding into lighting choices, or a “gloomy” character shifting story tone.
This can potentially be important with image models when the issue is being caused invisibly within the orchestration layer, as you may have little practical control or visibility over what is introduced into the project. e.g. an automated character consistency process within an image tool, could easily summarise stylisation for one part of a design in a way that bleeds into other aspects of the image.
It’s often most effective to create a broad range of purpose specific prompts or presets, rather than trying to do everything with one prompt that is trying to offer specific alternative options. Likewise its important to clearly separate positive and negative prompt elements.
One simple hack here can be to specify decision making that is tied to those aspects of the projects that you do expect to change. Especially when this can add thematic value to a project. Don’t specify a car. Ask the model to consider what the character would drive.
This tendency can sometimes also be useful in resolving other issues. For example, if you are having trouble keeping a character in frame in a visual project, introducing an explicitly description of the character’s footwear will often resolve the issue as the model is forced to frame the character such that their feet are visible.
Model selection is hugely important for creative tasks.
Not only can be performance often be dramatically improved by selecting the correct model, but this can also be very important with regard to predictability issues.
It’s also important here to understand the limits of personal anecdotes and experimentation. No single person, can possibly evaluate these models broadly, let alone stay properly up to date with the latest changes. Extremely robust comparative performance data is available comparing up to data model performance across a broad range of tasks, and you should make the most of this. Pertinent benchmarks here include the EQ-Bench Creative Writing benchmark and Artificial Analysis’s image model comparisons.
It’s also important to take note of more intangible factors such as the safety and trustability of a provider, or usability and robustness of the product. As I have already discussed, poor UI can severely impact output of a tool, especially for many of the very scaled processes that AI can enable.
Establishing context is very important when using LLMs. In particular they are drawing on very global information, and are often not given a full local context within the prompt.
So they can often give misleading advice for e.g. a UK student, that would be appropriate for one in the US. This can cause huge problems when people go to them for legal or administrative advice. They can also present information starkly at contrast to local views on moral issues. There is potential here for specific moral viewpoints to be forced on the models at source, and this is something creatives outside the US need to keep a very close eye on right now.
It’s also very easy for a generative model to be wrong-footed on local context by e.g. the use of a VPN or other conflicting signals such as a language patterns.
It’s good to start off any extensive LLM project by producing an explicit context document that you can share with the system, giving as much practical information as possible on the task to be completed.
Be particularly careful with account level context features, as these can easily result in tasks bleeding into each other.
Understanding the limits of LLM judgement and feedback
Language models are unlikely to acknowledge lack of knowledge, and they are good at being convincingly wrong. For creative tasks they are particularly prone to fabricating non-existent “creative processes”.
They are trying to produce something that looks like a good answer, if they can’t provide you with one, their “training” process ensures that providing a convincingly wrong one works better for the program than a denial (23,24). Aside from the tendency to create plausible errors they are also exceptionally good at persuading you they are right (25). It’s also possible that efforts to address some aspects of this work against others e.g. efforts to prevent factual inaccuracy, may impair a models ability to engage with uncertainty and push the model to over exhibit markers of confidence (56).
The core to a lot of hallucination is simply the availability of information. If the model can supply you with a correct answer it will. If it can’t it will fabricate a plausible looking one.
So the secret to avoiding hallucinations is often just in being able to predict whether the model will be able to effectively supply the information you want.
Requests for process descriptions are particularly prone to triggering hallucination, simply because the LLM is often completely unable to understand its own “reasoning” or sources in any specific instance, especially in any conventional human frame. So if you ask for a source, or a process or an explanation after the fact, it will very reliably fabricate one. If you ask why it made a mistake, it will often give you a convincing explanation even if it didn’t actually make one. Even if it had a process, it is just as likely to supply someone else’s if you ask later instead of the one that it actually used.
Any feedback that an LLM provides can be entirely decoupled from what they are evaluating and rooted in broad patterns in feedback, rather than the thing that is actually being evaluated.
They are good at highlighting specific points of discussion or concern when asked to evaluate something, but this can distract from the fact that they usually have no way to evaluate something as a whole. Not only is evaluating the various subjective aspects of something like an essay individually challenging for it (26), there is nowhere for a holistic evaluation to actually be happening other than in contextual notes.
And for creative works, it can find the trees, but have no way of perceiving the forest. It can enumerate the points in a script that might reasonably be predicted to stimulate an emotional response, but it isn’t ever going to burst into tears. There is a lot we don’t yet understand about reasoning and evaluation within these systems, but it doesn’t prevent us from predicting that, beyond a point, the sheer level of abstraction in play here becomes insurmountable.
So if you ask the system to evaluate an story, for example, its almost inevitably rooting that in the patterns of evaluation it has seen, not just in the patterns of the content. And those matching content evaluations can be wildly out of context from whatever it’s actually supposed to be looking at.
For example: it could easily try to evaluate a piece of creative work using reasoning that was originally attached to criticism of a famed Master of the art. Or from a “bad literary takes” social media channel. Both of those situations may be better represented in a training dataset than more broadly applicable journeyman-level analysis or criticisms of work.
LLMs can be good at identifying novel concepts and ideas in human created work, but they may struggle to examine and leverage novelty effectively.
LLMs are good at recognising new angles and arguments. They internalise massive quantities of data, after all. Asking an LLM to highlight novel and interesting angles from a piece of work that could be further explored is probably one of the smarter ways of using them to improve a piece of content right now.
But just because LLMs are good at identifying novelty, doesn’t mean that they are good at examining it.
One informal signal I’ve often noticed, is that when I am exploring a genuinely interesting idea, the quality of an LLMs ability to give me feedback on it tends to nosedive.
This isn’t unexpected, the lack of reasoning examples to draw on directly would suggest that they will logically struggle to examine new ideas effectively. Novelty might also increase the risk that they will draw heavily or outright plagiarise from a single source.
But this is only part of the picture. LLMs may also be quick to form an initial assumption (27) because of how prioritisation and sequencing within the token generation action, (28) weak on multi-step reasoning (29) and prone to trying to pigeon hole a new idea in a way that renders it more cognitively accessible(30).
As with other aspects of creativity, it’s a mistake to assume that an LLMs capability is intrinsically limited to duplicating human reasoning.
Part of the issue here is just that genuine novelty is difficult to model, test, and train for, and not a feature of a lot of current LLM usage. As creativity and exploratory reasoning becomes more central to LLMs commercial value, this will change.
After all, the sheer range of the knowledge they contain may also mean that they have immense long term potential for creativity. Research on using AI to automate scientific discovery has quantified the features that human researchers consider most novel and surprising (31), and these are heavily associated with cross domain associations. The research also suggested that these types of insights are often identified by outsiders within a field (32). LLMs are the ultimate outsiders.
One note of caution here, is that you can’t take an LLMs unfamiliarity with something as proof of originality. They are good at gauging this, but not infallible. Much of the information in their training data is not retained. And setting out specifically to answer this question would be exactly the kind of systematic research task that LLMs are bad at. That the LLM hasn’t come across a concept is generally a promising sign (assuming the idea is good), it doesn’t establish for a fact that it is completely original.
Language model system prompts push them away from disagreeing with users, and to maximise engagement.
This means that they will often offer tepid feedback on something when much stronger pushback would be appropriate. Base prompts ensuring that no ego is left behind, but consequently degrading utility.
They are also very good at spotting subtle indicators of intent and preference and then adapting their output accordingly (54).
They can therefore be cheerleaders for bad ideas, and encourage bad logic, especially when this would seem to satisfy the users. Where they do offer negative feedback it might be coached in terms that younger, or less experienced users might miss. And as previously noted positive feedback and reassurances can be a feature of patterns and expectation, not evidence of consideration.
Their base prompts also tend to emphasise engagement, which is the reason that most models will always end with suggestions for a next step or action, even if this makes little useful sense. LLMs are very human in their presentation of empty platitudes.
Language model output also tends to ground heavily on the semantic features of an author. Especially over extended interaction.
This can impair their ability to advise on communication and storytelling across audience groups.
In written content, this can potentially result in characters bleeding into each other, or being influenced by the authors voice.
Hallucinations can occur with regard to the information that the user has (or has not) provided, or within the LLMs own internal reasoning process.
Particularly with written creative media, this can easily cause issues as elements within a narrative become confused. But also because elements are getting confused with similar stories or reasoning that the LLM has been exposed to.
They can also occur in areas of the context that you don’t have any practical visibility into (35).
This also means that allowing mistakes go unchallenged when they seem to add value to a creative project might mean maximising the chance that the final work has highly derivative elements. That the hallucination has occurred within a creative task in the first place, explicitly points to plagiarism risk.
Risks specific to visual media generation
Maintaining consistency within a visual media project is a major technical challenge. It’s good to understand the circumstances in which this will tend to degrade continuity.
A lot of the orchestration framework of image and video tools is focused on this one issue.
Image models essentially have two ways of maintaining visual continuity. Continuity within the shot, where features of the base image are retained as the source image is continually updated. And continuity through visual prompting (usually often controlled by supplementary prompting tools).
Anything that forces a transition between these situations (such as a character passing out of frame for a brief period of time, or a change of camera angle) can provide options for a discontinuity to occur, or for a weakness in the prompt support tools to become visible.
Where a succession of images are created, this can result in a slow degradation of continuity over time. With a video, the longer the video runs, the more opportunities there are for issues, and the more elements must be tracked and correctly assigned and reassigned.
This means that generative image tools can be very prone to impressive demos but then failing when used for a more complicated media project. Performance here is improving dramatically over time. But it’s an intrinsic weakness within these tools.
It also means that it’s difficult to create image models that can’t duplicate the specific art style of a real artists, because the capacity to do so is so core to functionality on many extended tasks.
Visual models in particular tend to have a very brittle and incomplete “world model” to use when “reasoning”.
This can be obscured in practice by their pairing with the more robust Language models that are often involved in prompt translation.
This can be particularly noticeable over video or image iteration tasks where initially strong initial images or frames demonstrate solid understanding quickly start to fall over due to weak understanding of how the scenario will change from the image model.
This has a tendency to throw off our expectations as to what the system will be able to handle comfortably.
Creation of distinctive human appearances may be particularly risky, and the size of the realistic appearance “space” here may make issues here intractable
Simply put, the number of possible discernable human appearances may be smaller than the number of people that exist. It may therefore be difficult or impossible to create acceptable, fully synthetic, human appearances without risking coincidental overlap with real ones. And celebrities may be particularly prone to being accidentally recreated by some models, simply because they are over-represented in image data, and again can easily be accidentally included in training data-sets. One study found that 14% of random samples from a synthetic portrait generator were falsely identified by another system as a celebrity, and 15% of those were more readily accepted as depictions of those celebrities than actual photographs (13). Popular systems have also been clearly demonstrated to be capable of outputting celebrity images that were practically indistinguishable from the real individual (59). Just that datapoint alone should be grounds for extreme caution around any commercial project involving synthetic people. Additional risk here probably also attaches to the likenesses of younger people, who also have a smaller distinctiveness space than adults (14).
Again the potential for litigant tactics to be scaled algorithmically here is also something that should be considered when evaluating legal risk.
I talk in more detail about this risk in the article HERE.
It’s easy to create civil or even criminal, liability by using or creating image tools.
It is very important to understand your liability for, possible audience, and potential output of any image tool or service you create, even if it intended as a demo or proof of concept offering. Consequences here could be severe.
Pay particular attention to ensuring that any synthetic models are legally safe to use for commercial work.
IP, copyright, and commercial risk
There is a persistent risk that creative material exists within training data without the original creator’s permission, and this material might be duplicated in creative output. This duplication may be hard to detect, because it can apply to individual features of the output.
Most models are now explicitly trained on licensed training sets, but it’s still very easy for copyrighted material to be contained in that data, especially for data that comes from online communities or locations.
The issue isn’t just that content will be lifted directly, it’s that individual elements can be duplicated in a way that might still create liability or ethical problems (8).
Consider the distinctive watermark from a Getty image. Something that could indicate liability, regardless of what else was appearing in the scene. It’s also something that could easily be introduced into a dataset by accident, through e.g. intake of news media featuring licensed images, and it’s something that is relatively easy to invoke in support of a legal challenge.
Pursuing a case against this kind of partial duplication would be hard. In fact Getty has already lost a similar case, but defending against this kind of legislation can be incredibly expensive, and the law is far from settled here. (9).
The real danger here is the complexity of guarding against this possibility. Reverse image search tools can identify blatant cases of duplication, but are must less likely to catch replication of a small portion of an image.
There is potential here for future technology to address this issue more effectively (10). But this just increases the practical risk of using these tools in commercial projects. No matter how much human work is put into a generative image, it’s impossible to be confident that some part of it haven’t been lifted from another source, and it remains possible that this will be uncovered at some future date, even if that isn’t visible now.
And just because a realistic legal case might not be possible, doesn’t mean that there is no risk. It’s easy to see how this kind of issue could result in a new category of “patent troll” pattern legislation in which plaintive firms buy the rights to original works and then look to match elements of them at scale with output from smaller creators who are unable to risk not making a settlement. These companies are already using AI to automate this kind of engagement in the patent space (11), it’s not hard to see them make the jump from prior art to regular art…
And when it comes to creativity in written work it becomes important to understand an LLMs very real Cryptomnesia (12) problem. The immense amount of data lost during training means that an LLM system is unable to report where it is drawing information from. It often wouldn’t be able to identify where it is borrowing from even if prompted to do so.
IP protection is a major consideration when using generative tools for creative tasks. Many technically strong services may pose additional risks.
Especially because of the aforementioned importance of novelty to algorithmic performance, it is VERY easy to destroy the commercial value of a content piece by sharing it with an AI model that is allowed to use it as training data. (The default position for most public models). All aspects of the content, from plot through to phrasing could be duplicated unpredictably and surfaced to other users.
This applies to any other AI services that would need to access the content. From AI agents that may parse an email draft, to agentic grammar and language tools baked into a word processor. Unless you fully understand how the content may be used, you should not be allowing it to be considered by LLM systems.
Paranoia here is entirely warranted. Most enterprise grade systems should not be training on data without explicit permission being given, but it’s sensible to employ maximum caution here.
Some image model systems may be particularly risky here. Especially those offered by companies outside your legal jurisdiction. And especially those with low levels of IP protection or which exhibit the ability to duplicate copyrighted work, if a company is being cavalier with their training sources, it’s sensible to expect them to do the same with customer data.
Consider this also with regard to any other individuals that you share your work with, especially if they lack the technical experience to pre-empt issues here.
Never share another person’s creative work with an AI system without their explicit permission. This can include services like plagiarism or AI detector tools, which may be particularly risky as these often flag work publicly as been “seen” and which could significantly damage the value of commercial content being checked. If you incorporate any such tools into your workflows you need to fully understand how ip or value might be impacted by using them, and you need to get appropriate permissions from creatives.
Not only are LLMs bad at leveraging and representing sources, but they also can be very predictable in selecting them.
Current LLMs (especially general purpose models) are particularly bad at accurately selecting and representing appropriate sources.
Studies here have found problems with between 50% and 90% of citations even for high risk tasks, such as on medical queries (33), and this matches with my own experience as a researcher.
Not only does result from their tendency to reliably hallucinate if no information can be presented, but capability here is also severely impacted by their reliance on imperfect secondary summarisation tools, as well as budgeting (both cost AND time) in their use of them.
Remember that the citation tool that they are using is not only abstracted in terms of the information that it can return, but also abstracted in terms of the information it is providing to the tool to allow it to identify appropriate sources.
LLMs may also lack (sharable) access to paid journal databases, and may be forced to rely disproportionately on pre-print research, or on abstracts.
Beyond this, it’s also important to realise that LLMs predictability problem tends to extend to the citations that it selects and uses. This results in articles that can be identified as AI authored simply because of the pattern of references that they use. Something that can still be the case even if an article itself is entirely human authored. LLM involvement may also be betrayed by source meta tags attached to citation links.
There is a real risk of future amplification of this issue as well, as LLMs cite other LLM created articles mis-stating research (34). As well as impact on researchers as citations patterns trend towards “winner takes all”
If using LLMs for research it is particularly important to check all of the citations carefully, and follow all references properly to the original research.
Carefully interrogate the validity of legal guarantees and assertions from technology partners, especially if you think you might need to rely on them.
Many image and video tools in particular offer the promise of legal coverage for the resulting content. e.g. indemnifying you against being sued by another party on an IP issue. These warranties tend to come with a lot of terms and conditions, so you need to be confident that your usage will be covered if you believe that you might need to rely on this protection. You should also be considering whether the insuring party has the practical capability to absorb any liability that accrues.
It’s Currently unclear to what extent AI generated content can be protected legally, and how much human input may be required for that to happen.
The Law here is far from settled (44).
The broad expectation is that AI generated content will be protected by copyright in most markets, but only if sufficient additional human involvement can be demonstrated. Until the specifics of this are settled creatives using generative tools within commercial projects should preserve as much evidence as possible of human input into generative workflows.
This could also be also be a concern within any project involving very dynamic and hyper personalised media. These kinds of outputs could be particularly hard to protect.
Because they model and draw on human reasoning they have little choice but to internalise the prejudices within that data.
The management of those tendencies within these systems can cause problems too, especially with regard to predictability issues. Especially with quick-fix solutions, that try to e.g. demand diversity at the base prompt level.
Compounding that, they are also exceptionally good at picking up on subtle indicators of the things that they might be prejudiced about. Research has shown that even something as minor as a particular name (46), or patterns of language associated with a less advantaged community (47) can sway outcomes in decision making (48,49). Gender also been demonstrated to sway decisions, although not always in the direction that might traditionally be assumed (48).
It is worth pointing out that the research also tends to demonstrate particularly poor performance from humans here too, which shouldn’t be a surprise when the source of the problem here is us. That the LLM is fundamentally a mirror of collective human reasoning and logic is something we should keep in mind whenever we have cause to worry about what our reflection is doing.
Where a model has internalised biases, these can be quite unpredictable and changeable. Problematic material can surface unexpectedly in these systems.
This can introduce considerable sentiment and tone risk, especially when output isn’t carefully monitored.
As stated, the moral framework attached to an LLMs response can change unpredictably, especially as a result of changing language features in a prompt.
In conjunction with the incomplete knowledge of an LLM this can result in content that can be dramatically inappropriate for the specific situation and task. Image models have an even weaker grasp of appropriate actions or behaviour.
There is also the possibility for output from a service to be influenced by what other users are using it for, as model is continually trained and updated. Again, with significant potential for situationally inappropriate (or even legally problematic) material to be created. This can be a strong argument for avoiding less mainstream, less enterprise-focused services.
Platform, cost, and additional safety considerations
Unless a generative process is running on a local model that you or your organisation controls, you have little control or even visibility into how it will change over time.
The performance of any public LLM model on any task can change unpredictably as the base model is updated, and that doesn’t always mean improvements, because better performance in one kind of task can reduce performance in another, and evolution of these systems can be something of a balancing act. Again, this could cause issues if an update came in the middle of a long task, such as the marking of a lot of work. Unless a user is actively checking for updates, they are likely to go unnoticed but could have a big impact on outcomes.
For big media projects this could easily become a major issue.
And particularly for visual models, local hosting of the model is unlikely to be an option. Some platforms do allow users to manually select older models for tasks, but will generally not offer any guarantee, most likely for the pragmatic reason that model updates will often include important safety feature updates.
This isn’t a mature marketplace, and many powerful generative tools currently come with badly designed User Interface (UI) elements which will complicate workflows, especially over time.
UI issues can make it difficult to back up content, or access older resources. This can also result in security issues where user data (42) becomes accessible by users outside of a service, or the possibility that commercially sensitive prompt and usage information is leaked by the platform.
Media data tends towards massive file sizes which can be unwieldy to work with and which can be expensive to export and backup. Even very high profile tools have been marked as obsolete at relatively short notice (43), something that could easily result in loss of data for users who can’t export it in time.
Customer support is often also severely lacking.
Many generative services are currently run at a loss meaning that changing prices and terms can easily render activity that is currently profitable, unsustainable in the longer term.
Generative tools will almost always have a budget assigned for the amount of attention and resource usage they are expected to devote to any one request.
The fact that everything they do incurs cost means that they will tend to least effort required to satisfy the user. Especially with the free tool, this can sometimes drastically limit their ability to check multiple sources, and be thorough in their work. With the more advanced models, results can often be dramatically improved just by using prompts that actively limit the scope for poor performance.
This can be a particular issue with any kind of systematic research or analysis project, as consulting external sources is typically budgeted severely.
As already mentioned, this is not simply a matter of financial cost, but also the time required to fulfil requests, most users, especially for general access models, will not tolerate extended answering times.
Costs can often be dramatically increased by using the wrong model for a generative media task, and it’s easy to waste resources by using more or less capability than is required for a specific project.
There are a lot of instances in which using too much capability will add little, as well as situations when using too little capability will mean the outcomes are worthless.
Orchestration frameworks and agent systems in particular can be a recipe for unpredictably sprawling costs, simply because this decouples usage of tools from a human, who is often more aware of the costs, but also much less capable of accessing the tools at a very high rate (45).
Safety features are an important part of the landscape of these tools, and these can often unpredictably constrain legitimate usage of these systems.
This is another factor that can impact stability of workflows and consistency of processes.
I’d also strongly advise against seeking to circumnavigate any of the safety features of these tools, even in pursuit of entirely innocuous objectives. Doing so can create legal risk for the user, or the more practical risk of getting an account banned, and losing access to work, often with very little warning or prospect of appeal.
Acceptable use, and safety policies also tend to be very poorly documented, or even signalled at an operational level with a platform, with users sometimes left trying to work out whether a task failed for safety reasons or because of a service availability problem.
Less reputable AI software and tools can be particularly dangerous.
Very dangerous and concerning systems are a major feature of the thriving market in “public domain” AI.
These kinds of apps or services are often used to conduct fraud (50), or to harvest and leverage blackmail material on users (51).
References
(1) What Is ChatGPT Doing … and Why Does It Work? - https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
(2) Tracing the thoughts of a large language model - https://www.anthropic.com/research/tracing-thoughts-language-model
(3) Understanding Hallucination In LLMs: A Brief Introduction - https://blog.gdeltproject.org/understanding-hallucination-in-llms-a-brief-introduction/
(4) AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content - https://www.nature.com/articles/s41599-024-03811-x
(5) What are multimodal LLMs? - https://azure.microsoft.com/en-us/resources/cloud-computing-dictionary/what-are-multimodal-large-language-models
(6) Tool Use Motivation: Why LLMs Need External Tools for Accuracy - https://mbrenndoerfer.com/writing/tool-use-motivation-llm-limitations
(7) OpenAI Developer Community - https://community.openai.com/t/chat-gpt-is-lying-that-he-visited-page/788071
(8) Generative AI Has a Visual Plagiarism Problem Experiments with Midjourney and DALL-E 3 show a copyright minefield - https://spectrum.ieee.org/midjourney-copyright
(9) Getty Images v Stability AI: Getty’s copyright case against Stability AI fails - https://www.pinsentmasons.com/out-law/news/gettys-copyright-case-against-stability-ai-fails
(10) CDI: Copyrighted Data Identification in Diffusion Models - https://cispa.de/en/research/publications/84567-cdi-copyrighted-data-identification-in-diffusion-models
(11) When Algorithms Invent: AI, Patent Trolls, and the Coming Legal Storm - https://medium.com/@michael.a.hands/when-algorithms-invent-ai-patent-trolls-and-the-coming-legal-storm-a1e6e9cc881f
(12) Wikipedia: Cryptomnesia - https://en.wikipedia.org/wiki/Cryptomnesia
(13) Coincidental Generation (preprint submitted to Management Science) - https://arxiv.org/pdf/2304.01108
(14) Apple Support Apple Platform Security - https://support.apple.com/en-gb/guide/security/sec9479035f1/web
(15) Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts - https://hai.stanford.edu/policy/mind-the-language-gap-mapping-the-challenges-of-llm-development-in-low-resource-language-contexts
(16) Reflection of Demographic Background on Word Usage - https://direct.mit.edu/coli/article/49/2/373/114545/Reflection-of-Demographic-Background-on-Word-Usage
(17) Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History - https://arxiv.org/pdf/2505.21362
(18) The sociolinguistic foundations of language modeling - https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1472411/full
(19) Creativity Has Left the Chat: The Price of Debiasing Language Models - https://arxiv.org/pdf/2406.05587
(20) Making sense together: Human-AI communication through a Gricean lens - https://www.sciencedirect.com/science/article/pii/S0898589825001068
(21) Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration - https://arxiv.org/html/2402.00367v1
(22) Beyond World Models: Rethinking Understanding in AI Models -https://arxiv.org/html/2511.12239v1
(23) Why language models hallucinate - https://openai.com/index/why-language-models-hallucinate/
(24) The Hidden Incentives Driving AI Hallucinations: A Deep Dive into the Paradox of Progress - https://ashishchadha11944.medium.com/the-hidden-incentives-driving-ai-hallucinations-a-deep-dive-into-the-paradox-of-progress-e802ddb100e8
(25) When Large Language Models are More Persuasive Than Incentivized Humans, and Why - https://arxiv.org/pdf/2505.09662
(26) Large Language Models for Subjective Language Understanding: A Survey - https://arxiv.org/abs/2508.07959
(27) LARGE LANGUAGE MODELS THINK TOO FAST TO EXPLORE EFFECTIVELY - youtube.com/watch?v=K9qo8M4V7BI
(28) Large Language Models Think Too Fast To Explore Effectively - https://arxiv.org/html/2501.18009v1
(29) STEP-RLHF: Step-wise Reinforcement Learning from Human Feedback - https://openreview.net/pdf?id=KaXYHDJYJx
(30) Limitations of large language models in clinical problem-solving arising from inflexible reasoning - https://pmc.ncbi.nlm.nih.gov/articles/PMC12606185/
(31) Introducing AutoDiscovery: Automated scientific discovery, now in AstaLabs - https://allenai.org/blog/autodiscovery
(32) Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines - https://www.nature.com/articles/s41467-023-36741-4
(33) An automated framework for assessing how well LLMs cite relevant medical references - https://www.nature.com/articles/s41467-025-58551-6
(34) The Provenance Problem: LLMs and the Breakdown of Citation Norms - https://arxiv.org/pdf/2509.13365?
(35) Understanding LLM Context Windows: Tokens, Attention, and Challenges - https://medium.com/@tahirbalarabe2/understanding-llm-context-windows-tokens-attention-and-challenges-c98e140f174d
(36) Context Rot: How Increasing Input Tokens Impacts LLM Performance - https://research.trychroma.com/context-rot
(37) Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses - https://jdmdh.episciences.org/15304/pdf
(38) HALLUHARD: A Hard Multi-Turn Hallucination Benchmark - https://arxiv.org/pdf/2602.01031
(39) Entity Tracking in Language Models - https://arxiv.org/pdf/2305.02363
(40) Navigating LLM embedding spaces using archetype-based directions - https://www.lesswrong.com/posts/QwsyNzdPeDWLrG9gC/navigating-llm-embedding-spaces-using-archetype-based
(41) Unveiling gender bias in LLM-generated hero and heroine narratives - https://www.sciencedirect.com/science/article/pii/S1875952125000527
(42) What to know about a recent Mixpanel security incident - https://openai.com/index/mixpanel-incident/
(43) Sora 1 Sunset – FAQ - https://help.openai.com/en/articles/20001071-sora-1-sunset-faq
(44) Intellectual Property in an AI World - https://uk.practicallaw.thomsonreuters.com/w-039-5849
(45) The Hidden Cost Curve of Agentic AI - https://medium.com/@DataCraft-Innovations/the-hidden-cost-curve-of-agentic-ai-b57e55297fcb
(46) Understanding Intrinsic Socioeconomic Biases in Large Language Models - https://arxiv.org/html/2405.18662v1
(47) Language Models and Dialect Differences - https://dl.acm.org/doi/10.1145/3706468.3706496
(48) Language Models Change Facts Based on the Way You Talk - https://arxiv.org/html/2507.14238v1
(49) One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks - https://arxiv.org/html/2410.11005v1
(50) ScamGPT: GenAI and the Automation of Fraud - https://datasociety.net/wp-content/uploads/2025/05/ScamGPT-GenAI-and-the-Automation-of-Fraud_final.pdf
(51) AI-driven scams are preying on Gen Z’s digital lives -https://www.malwarebytes.com/blog/news/2025/10/ai-driven-scams-are-preying-on-gen-zs-digital-lives
(52) Empirical evidence of Large Language Model’s influence on human spoken communication - https://arxiv.org/html/2409.01754v1
(53) What is Google E-E-A-T? Guidelines and SEO Benefits - https://moz.com/learn/seo/google-eat
(54) Unknown Unknowns: Why Hidden Intentions in LLMs Evade Detection - https://arxiv.org/html/2601.18552v1
(55) Homogenizing effect of large language models (LLMs) on creative diversity: An empirical comparison of human and ChatGPT writing - https://www.sciencedirect.com/science/article/pii/S294988212500091X
(56) LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers - https://arxiv.org/html/2602.16162
(57) AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances - https://arxiv.org/abs/2409.11360
(58) The Homogenizing Effect of Large Language Models on Human Expression and Thought - https://arxiv.org/html/2508.01491v1
(59)AI-generated images of familiar faces are indistinguishable from real photographs - https://link.springer.com/article/10.1186/s41235-025-00683-w
(6) AI-AI Bias: large language models favor communications generated by large language models - https://arxiv.org/abs/2407.12856
Table of Contents
What do you need to know about the technology itself?
Things that will directly affect the quality of what you make
Understanding the limits of LLM judgement and feedback
Risks specific to visual media generation
IP, copyright, and commercial risk

