AI-powered Typo Hunting: Trust Your Docs, Readers Will
Our documentation has a trust problem, and I just found 142 reasons why. It started with a silly typo I noticed on one of the pages – something like “cotnact” instead of “contact”. It was quick to fix, but it got me thinking: are there more?
Third‑party writing assistants are available as browser extensions, and we also have a spelling mistake checker available within Jetpack. With such tools, it’s easy to catch typos when editing pages, BUT it requires being on a specific page in edit mode.
Why it’s a problem
Typos can negatively impact our company’s credibility, giving the impression of negligence or lack of expertise.
Solution
I wanted a better approach—proactive rather than reactive.
Fortunately, WordPress.com public API made it easy to build an automated solution. Leveraging the WordPress.com API, I scanned all Jetpack.com support pages by sending their content to the GPT‑4o model (a multilingual, multimodal generative pre‑trained transformer developed by OpenAI) with this prompt:
prompt = """Your task is to check the provided text in American English for accidental typos.
List all obvious typo errors in the provided text and propose a replacement.
Do not list any of those:
- punctuation errors,
- grammar errors,
- typos in html attributes,
- typos in code snippets,
- words including HTML special characters.
"""
Results and next steps
I ended up with 142 pages that required our attention. Some of the detected typos may be false positives, some may need a review by a native speaker, but many are accurately identified typos (“Keet”, “Nexdoor”, “perfomance”).
Cleaning up typos in the Jetpack.com documentation – work in progress.Curious about the technical details? Here’s the code I used:
from openai import OpenAIimport jsonimport requestsimport pandas as pd client = OpenAI() def get_wp_posts(id, type): # Base URL for the WordPress.com API request base_url = f"https://public-api.wordpress.com/rest/v1.1/sites/{id}/posts/?type={type}" page = 1 # Start from the first page # List to store the post IDs and URLs posts_data = [] while True: # Append the page number to the base URL url = f"{base_url}&page={page}" response = requests.get(url) if response.status_code == 200: data = response.json() posts = data.get('posts', []) if not posts: break # Break the loop if no posts are returned for post in posts: posts_data.append({'id': post['ID'], 'url': post['URL'], 'content': post['content']}) page += 1 # Increment the page number for the next request else: print(f"Failed to retrieve data: {response.status_code} {response.text}") break # Convert list of posts to DataFrame return pd.DataFrame(posts_data) def find_typos(x): prompt = """Your task is to check the provided text in American English for accidental typos. List all obvious typo errors in the provided text and propose a replacement. Do not list any of those: - punctuation errors, - grammar errors, - typos in html attributes, - typos in code snippets, - words including HTML special characters. """ response = client.responses.create( model="gpt-4o-2024-08-06", input=[ {"role": "system", "content": prompt}, {"role": "user", "content": x} ], text={ "format": { "type": "json_schema", "name": "typos", "schema": { "type": "object", "properties": { "typos": { "type": "array", "items": { "type": "string" } }, "replacements": { "type": "array", "items": { "type": "string" } }, }, "required": ["typos", "replacements"], "additionalProperties": False }, "strict": True } } ) print(response.output_text) return json.loads(response.output_text) df = get_wp_posts(20115252, "jetpack_support")df["typos"] = df["content"].apply(find_typos)
Your turn, it is.
Typos might seem small, but they speak volumes about professionalism and attention to detail. How confident are you about your own content? Have you thought about doing something similar on your site or blog? What approach did you take? Are you ready to try this method? Or maybe you have AI prompt ideas beyond spell‑checking? Let us know in the comments!
#ArtificialIntelligence #NaturalLanguageProcessing #SemanticSearch #WordPressCom