I’ve said this before, but it feels like AI is ubiquitous and in everything these days. It spreads a lot of bad press for us stenographers in that people believe we are or will soon be replaceable. We can further extrapolate from the Pygmalion effect that those beliefs impact reality.
As many know, I’m an amateur programmer. I know relatively little about the top-of-the-line tech and can only code on a very basic level. That said, the more I learn conceptually, the more I’m in awe of just how far computers have come, and how far they have to go. You see it every day on your smartphone and in your steno software. Computers are hard at work and designed to do amazing things.
Here is the thing about computers: They only do what you tell them to do. You have to come up with a set of instructions, an algorithm, that gets it from point A to point B. They solve problems, but only using the instructions you give them. Even if you come up with the instructions, the results can be useless. We can imagine problems as mathematically solvable and insolvable — finite or infinite. An example of an infinite problem is a Fibonacci sequence. You take the next number in the sequence, and you add it to the last number in the sequence. This stretches into infinity. You can easily write a program to generate Fibonacci numbers, and the computer would die before generating them all because there are infinite numbers.
Then there are solvable problems. Chess is considered a solvable problem because it is a game with a finite number of pieces, spaces, and moves. There’s a problem, though. There are so many moves in chess that just the datasets for having 7 pieces at the end of the game (Lomonosov tablebases) are said to be 140 terabytes of information. To put that into perspective, it’s been estimated that all the books in the world would fit on about 60 terabytes. Even if you had a supercomputer capable of generating every possible move in Chess, the information would be absolutely useless to you, because to digest all of it would be the equivalent of reading every book ever written thousands of times.
So let’s think of AI and audio in terms of problem solving. The most basic way to describe Alexa and Siri is that they listen to you for keywords, and check what you say against their database, and decide what to do based on that algorithm we talked about. Let’s face it, there are only maybe 200,000 words in the English language. You could store every single one as a large audio file with less than 700 GB. Here is the deal: computers don’t hear in the traditional sense. They’re taking what you say and presenting educated guesses based on all the data they have. So now, if you will, imagine all 200,000 English language words and every combination they could possibly be in. To put it in perspective, it is a way bigger number than this. Now let’s add all the different ways words might be said, or all the different scenarios that might interfere with how the computer is “hearing.” Let’s add all the different accents and dialects of English.
Let me say this: It is very likely, in my mind, that someday computers will be programmed to hear as well as stenographers in any given situation. It’s a solvable problem. It’s a winnable game. But right now, based on what I know, there’s an indeterminate amount of time and money that it’ll take to get to a point where it is perfect and 95 percent or better in most or all scenarios in a reasonable amount of time. Take for a moment the example of Solar Roadways. Pave the roads in solar panels to solve America’s energy crisis. Millions of dollars were poured into this solution, and it failed. Remember, solvable problem, winnable game. Finite number of people with finite energy needs. Failed anyway. Speech-to-text is estimated to be worth billions of dollars. But what if it takes 100 more years to solve? How many millions or billions of dollars need to be lost before the solution is declared “good enough?” Remember, they can sell Alexa and Dragon today for piles of money. They don’t need 95 percent. The exponential growth of computers has ended, and unless the experts bring us quantum computing or some other huge leap in technology, we’re looking at computers being more money to upgrade.
Those companies you see that are touting transcription AI in 2019 are doubtlessly having transcribers fix AI-prepared transcripts at best. Their game is psychological. It’s not cost saving, it’s cost shifting from the worker to the boss. That’s why it’s not being sold to the public. It’s a magic trick. Look to the left while the magician rolls the coin to the right. It is in our best interest as stenographers to call this out when appropriate, and continue to bolster our own magic skills and industry as the go-to for the hearing impaired and legal communities. Could some geniuses come along and program your replacement next year? Sure. But one thing that you should understand is that it’s not very likely, and buying the hype before they have a product to sell is only going to hurt our morale and livelihoods. We have our method. We have a product. We’ve got more brains, voters, and history in the field. So do yourself and all of us a favor, don’t buy the hype, and the next time you meet a transcriber working for Fake AI Transcription Corp, LLC, tell them they can double their earnings and better themselves by joining the stenographic legion. If a supercomputer is required to solve Chess, what do you believe is required to get automatic speech recognition to 95 percent?
May 26, 2019 Edit:
I should add that it’s obvious computers are becoming ruthlessly good at transcribing one speaker, especially in a closed or suitable environment. There are hours of video on that. It’s introduction of multiple speakers in a less-than-perfect environment where the thing struggles, probably because of all those mathematical issues talked about above.
June 18, 2019 Edit:
A post recently made its rounds on social media claiming a computer science PhD couldn’t see the perfect transcription coming out any time soon. It stands in stark contrast to the claims of some that the technology is already perfect.
August 17, 2019 Edit:
Another article came to light showing that Facebook Messenger and other automatic transcription apps are actually using human transcribers behind the scenes. Using my amateur knowledge of computer coding, I can say this is clear evidence that they need data (the transcriptions) to feed into the machine learning algorithms. Further, if they’re not paying their transcribers exceptionally well and bad data is being inputted, it could ultimately make automatic transcription programs worse. Expect some pretty big delays on the AI transcription front.
August 25, 2019 Edit:
I had created a “mock voice recognition video” just to prove how easy it would be for a company to lie about its voice recognition progress. I coded a computer program that spits back whatever text you give it at a set words per minute. So next time you’re at an automatic transcription demonstration, ask yourself if what you’re seeing is automatic or staged. I often give the example of Project Natal and Peter Molyneux. Gamers were made to believe that the Milo demonstration of Project Natal was a showcase of technology that was coming out. The truth broke years later that the demonstration was heavily scripted, and over ten years later, no such technology exists. Similarly, when someone tells you that their audio transcription program is flawless — question whatever you’re seeing and realize how easy it is to stage and sell things.