So time to switch to full "old man yells at cloud computing" energy.
I'm trying to migrate some databases from a creaky old EC2 instance to RDS. I've set everything up so I can do a clean cut-over and verified that the connection works from both the EC2 instance and a bastion instance.
So how to move the data? The obvious solution after a quick poke around in AWS' documentation is the Data Migration Service which has a handy "pay exactly what you use and no more" serverless option alongside the usual EC2-instance-in-a-funny-hat option.
Set up is simple and the console does the usual thing of setting up all the bits you need outside of the core service stuff and having a few streamlined bits to make ClickOps simpler.
All good, check the documentation, check some random tutorial, no errors, no muss, no fuss, click "Start".
And it sits there doing stuff for ~10 minutes, posting terse logs to a log group and then crashes with an "internal error".
If I never see the word "internal error" (or "unknown error") ever again, I'll die a happy man. But this is the real world, so I won't.
I'm migrating some Wordpress instance's database here, so it's not heavy lifting, so what the heck is going on?
It turns out that a grand total of _two_ people have seen this error before me.
This is a tool with a fancy modern UI and integrated help and everything, this isn't some fly-by-night tool hacked into the side of some other unrelated console panel ... like ELB.
The best guess from the info I'd found is that it fails like this if you don't assign enough compute to it.
This is a database with a couple of hundred megabytes of data at most, and the throughput probably peaks at an update a second when people aren't editing it. This isn't big data, this is barely even small data. Why the heck it needs more than 1 vCPU and 2 GB of RAM is beyond me.
But ok, let's bump it up to 4. Except that you can't actually do that because this weird failed state somehow prevents you from editing it. Including with the CLI.
Ok, fair enough, this is old enough to have Terraform bindings, so off to recreate the 3 ... 9 ... 12 resources in Terraform, then I can delete and recreate instead of updating. Easy, right?
A note on UX here: if the cycle time is small, a lot of sins can be forgiven. So for example, if I'm hacking on a ECS service and task definition, my cycle time can be as small as a couple of minutes because it generally creates new tasks a few tens of seconds after stuff is updated.
DMS's cycle time is close to half an hour. Because I can't edit, it's ~10 minutes to destroy the failed task (or configuration, the documentation can't make up it's mind) ~30 seconds to recreate it, then more than 10 minutes to find out if it failed again or not.
So in total today, ~6 hours on this, I've done ~3 cycles fixing all the connection issues, (it can't resolve .internal hostnames?!) moved it to Terraform, and then done _two_ total cycles of actually trying to tune the configuration until it works.
This is painful.
And it's downright insulting that the issue I'm running into, whatever it is, is something known and apparently fixable.
But nobody could be bothered to add some hint in the error messages.
Now I get hiding internal details when you're trying to keep your database structure secrets out of the hands of hackers, but I'm a fricking developer working on an authenticated API who is trying to debug your service. The call is coming from inside the building.
I heard a rumour that AWS middle managers got bonuses based on the number of services released, and this is why (nearly) everything serverless post-Lambda sucks. (Elasticache serverless is fricking magic)
I've also done a bit more reading and apparently DMS likes to eat your data and crash. So if the "quality" of this serverless offering is anything to go by, I'll be surprised if I can get this working, and if it successfully "completes" my migration, I'll be surprised if the data is correct.
Sigh.
AWS please fix your existing tools before shitting out broken services. And nobody needs Q.
#aws #tech #cloudcomputing #clowncomputing