Skip to Main Content
QUL logo

Artificial Intelligence

Artifical Intelligence and the research process

Generative AI Algorithms

Understanding how Gen AI operates at a surface level, helps to provide context for when to use it and why its outputs must be evaluated. Text generation tools (e.g. ChatGPT) go beyond automated queries to build unique content including images, text, audio, video, and code. Content is drawn from books, Wikipedia, web, news, free articles, and social media as illustrated below. Most of these tools do not draw on subscription library databases except where they are built into the database itself. However, copyrighted works are also being used without permission, attribution, or compensation resulting in multiple lawsuits.

      ChatGPT datasets

  •  From the input prompt, individual words or parts of words are parsed into tokens or units of data. Words are the basic unit of analysis, not sentences
  • Tokens are represented by numerical vectors and situated in dimensional space. For example, the word cat would be represented by 100s of vectors reflecting the context of the input prompt.              Image of a cat represented by 100s of vectors
  • Outputs are constructed using complex algorithms that predict the next best/most common word. They are determined by weighting frequency of words and semantic and syntactic relationship between words.
  • Outputs may be incorrect and out-of-context since they rely solely on the positioning and patterns of similar existing words within the training data. Remember that misinformation and disinformation are part of the dataset.

 

Related Readings

Koenig, A. (2020). The algorithms know me and I know them: Using student journals to uncover algorithmic literacy awareness. Computers and Composition, 58, 102611. 

A jargon-free explanation of how AI LLMs work.  (2023). Offers visual model.

Representation of words in dimensional space. (A gentle introduction to vector space)

Two-dimensional vector space

Word vectors are processed through many layers of transformation

ChatGPT-3 uses 96 layers where each word can be represented by up to 12,288 numbers

Each layer adds information to clarify word meaning and better predict the next word