CART v Autocraption, A Strategic Overview For Captioners

With the news that Verbit has bought VITAC, there was some concern on steno social media. For a quick history on Verbit, it’s a company that claimed 99 percent accuracy in its series A funding. In its series B funding it was admitted that their technology would not replace the human. Succinctly, Verbit is a transcription company where its transcribers are assisted by machine learning voice recognition. Of course, this all has the side effect of demoralizing stenographers who sometimes think “wow, the technology really can do my job” because nobody has the time to be a walking encyclopedia.

But this idea that Verbit, a company started in 2016, figured out some super secret knowledge is not realistic. To put voice recognition into perspective, it’s estimated to be a market worth many billions of dollars. Microsoft is seeking to buy Nuance, the maker of Dragon, for about $20 billion. Microsoft has reportedly posted revenue over $40 billion and profit of over $15 billion. Verbit, by comparison, has raised “over $100 million” in investor money. It reports revenue in the millions and positive cash flow. Another company that reports revenue in the millions and positive cash flow? VIQ Solutions, parent of Net Transcripts. As described in a previous post, VIQ Solutions has reported millions in revenue and a positive cash flow since 2016. What’s missing? The income. Since 2016, the company hasn’t been profitable.

I might actually buy some stock, just in case.

Obviously, things can turn around, companies can go long periods of time without making a profit, bounce back, and be profitable. Companies can also go bankrupt and dissolve a la Circuit City or be restructured like JCPenney. The point is not to disparage companies on their financials, but to give stenographic captioners real perspective on the information they’re reading. So, when you see this blurb here, what comes to mind?

Critical Thinking 101

Hint. What’s not being mentioned? Profit. While this is not conclusive, the lack of any mention of profit tells me the cash flow and revenue is fine, but there are no big profits as of yet. Cash flow can come from many things, including investors, asset sales, and borrowing money. Most of us probably make in the ballpark of $50,000 to $100,000. Reading that a company raised $60 million, ostensibly to cut in on your job, can be pretty disheartening. Not so once you see that they’re a tiny fraction of the overall picture and that players far bigger than them have not taken your job despite working on the technology for decades.

Moreover, we have a consumer protection crisis on our hands. At least one study in 2020 showed that automatic speech recognition can be 25 to 80 percent accurate depending on who’s speaking. There are many caption advocates out there, such as Meryl Evans, trying to raise awareness on the importance of caption quality. The messaging is very clear: automatic captions are crap (autocraptions), they are often worse than having no captions, and a single wrong word can cause great confusion for someone relying on the captions. Just go see what people on Twitter are saying about #autocraptions. “#NoMoreCraptions. Thank you content creators that do not rely on them!”

Caring about captioning for people who need it makes your brand look good?
I wonder if a brand that looks good makes more money than one that doesn’t…

This isn’t something I’m making up. Anybody in any kind of captioning or transcription business agrees a human is required. Just check out Cielo24’s captioning guide and accuracy table.

Well, this is a little silly. Nobody advertises 60 percent accuracy. It just happens. Ask my boss.

If someone’s talking about an accuracy level of 95 percent or better, they’re talking about human-verified captions. If you, captioner, were not worried about Rev taking away your job with its alleged 50,000 transcribers, then you should not throw in the towel because of Verbit and its alleged 30,000 transcribers. We do not know how much of that is overlap. We do not know how much of that is “this transcriber transcribed for us once and is therefore part of our ‘team.'” We do not know how well transcription skills will fit into the fix-garbage-AI-transcription model. The low pay and mistreatment that comes with “working for” these types of companies is going to drive people away. Think of all the experiences you’ve had to get you to your skill level today. Would you have gotten there with lower compensation, or would you have simply moved on to something easier?

Verbit’s doing exceptionally well in its presentation. It makes claims that would cost quite a bit of time and/or money to disprove, and the results of any such investigation would be questioned by whoever it did not favor. It’s a very old game of making claims faster than they can be disproven and watching the fact checkers give you more press as they attempt to parse what’s true, partially true, and totally false. This doesn’t happen just in the captioning arena, it happens in legal reporting too.

$0/page. Remember what I said about no profit?
It doesn’t matter if they’re never profitable. It only matters that they can keep attracting investor money.

This seems like a terrifying list of capabilities. But, again, this is an old game. Watch how easy it is.

It took me 15 seconds to say six lies, one partial truth, and one actual truth. Many of you have known me for years. What was what? How long will it take you to figure out what was what? How long would it take you to prove to another person what’s true and what’s false? This is, in part, why it is easier for falsehoods to spread than the truth. This is why in court and in science, the person making a claim has to prove their claim. We have no such luxury in the business world. As an example, many years ago in the gaming industry Peter Molyneux got up on stage and demo’d Milo. He said it was real tech. Here was this dynamically interactive virtual boy who’d be able to understand gamers and their actions. We watched it with our own eyes. It was so cool. It was BS. It was very likely scripted. There was no such technology and there is no such technology today, over eleven years later. Do you think Peter, Microsoft, or anybody got in trouble for that? Nope. In fact, years later, he claimed “it was real, honest.”

Here’s the point: Legal reporters and captioners are going to be facing off with these claims for an indeterminate amount of time. These folks are going to be marketing to your clients hard. And I just showed you via the gaming industry that there are zero consequences for lying and that anything that is lied about can just be brushed up with another lie. There will be, more or less, two choices for every single one of you.

  1. Compete / Advocate. Start companies. Ally with deaf advocates.
  2. Watch it happen.

I have basically dedicated Stenonymous to providing facts, figures, and ways that stenographers can come out of the “sky is falling” mindset. But I’m one guy. I’m an official in New York. Science says there’s a good chance what we expect to happen will happen and that’s why I fight like hell to get all of you to expect us to win. That’s also why these companies repeat year after year that they’re going to automate away the jobs even when there’s zero merit or demand for an idea. You now see that companies can operate without making any profit, companies can lie, much bigger companies haven’t muscled in on your job, and that the giant Microsoft presumably looked at Verbit, looked at Nuance, and chose Nuance.

I’m not a neo-luddite. If the technology is that good, let it be that good. Let my job vanish. Fire me tomorrow. But facts are facts, and the fact is that tech sellers take the excellent work of brilliant programmers and say the tech is ready for prime time way before it is. They never bother to mention the drawbacks. Self-driving cars and trucks are on the way, don’t worry about whether it kills someone. Robots can do all these wonderful things, forget that injuries are up where they’re in heaviest use. Solar Roadways were going to solve the world’s energy problems but couldn’t generate any energy or be driven on. In our field, lives and important stakeholders are in danger. What happens when there’s a hurricane on the way and the AI captioning tells deaf people to drive towards danger?

Again, two choices, and I’m hoping stenographic captioners don’t watch it happen.

Aggressive Marketing — Growth or Flailing?

During our Court Reporting & Captioning Week 2021 there were a couple of press releases and some press releases dressed up as journalism all about digital recording, automatic speech recognition, and its accuracy and viability. There’s actually a lesson to be learned from businesses that continually promise without any regard for reality, so that’s what I’ll focus on today. I’ll start with this statement. We have a big, vibrant field of students and professionals where everyone that is actually involved in it, from the smallest one-woman reporting armies to the corporate giants, says technology will not replace the stenographic court reporter. Then we have the tech players who continuously talk about how their tech is 99 percent accurate, but can’t be bothered to sell it to us, and whose brilliant plan is to record and transcribe the testimony, something stenographers figured out how to do decades ago.

Steno students are out there getting a million views and worldwide audiences…
And Chris Day? He’s posting memes on the internet.

You know the formula. First we’ll compare this to an exaggerated event outside the industry, and then we’ll tie it right into our world. So let’s breeze briefly over Fyre Festival. To put it in very simple terms, Fyre Festival was an event where the CEO overpromised, underdelivered, and played “hide the ball” until the bitter end. Customers were lied to. Investors were lied to. Staff and construction members were lied to. It was a corporate fiasco propped up by disinformation, investor money, and cash flow games that ended with the CEO in prison and a whole lot of people owed a whole lot of money that they will, in all likelihood, never get paid. It was the story of a relative newcomer to the industry of music festivals saying they’d do it bigger and better. Sound familiar?

As for relative newcomers in the legal transcription or court reporting business, take your pick. Even ones that have been incorporated for a couple of decades really aren’t that impressive when you start holding up the magnifying glass. Take, for example, VIQ Solutions and its many subsidiaries:

I promise to explain if you promise to keep reading.

VIQ apparently trades OTC so it gives us a rare glimpse of financial information that we don’t get with a lot of private companies. Right off the bat, we can see some interesting stuff. $8 million in revenue with a negative net income and a positive cash flow. Positive cash flow means the money they have on hand is going up. Negative income means the company is losing money. How does a company lose money but continue to have cash on hand grow? Creditors and investors. When you see money coming in while the company is taking losses, it generally means that the company is borrowing the money or getting more cash from investors/shareholders. A company can continue on this way for as long as money keeps coming in. Companies can also use tricks similar to price dumping, and charge one client or project an excessive amount in order to fund losses on other projects. The amazing thing is that most companies won’t light up the same way Fyre did, they’ll just declare bankruptcy and move on. There’s not going to be a big “gotcha” parade or reckoning where anyone admits that stenographic court reporting is by far the superior business model.

This is juxtaposed against a situation where, for the individual stenographic reporter, you’re kind of stuck making whatever you make. If things go badly, bankruptcy is an option, but there’s never really an option to borrow money or receive investor money for decades while you figure it out. Seeing all these ostensible giants enter the field can be a bit intimidating or confusing. But any time you see these staggering tech reveals wrapped up in a paid-for press release, I urge you to remember Fyre, remember VIQ, and remember that no matter what that revenue or cash flow looks like, you may not have access to the information that would tell you how the company is really doing.

This also leads to a very bright future for steno entrepreneurs. As we learn the game, we can pass it along to each other. When Stenovate landed its first big investor, I talked about that. Court reporting and its attached services, in the way we know them and love them, are an extremely stable, winning investment. Think about it. Many of us, when we begin down this road, spend up to $2,000 on a student machine and up to $9,000 on a professional machine and software. That $11,000 sinkhole, coupled with student loan debt, grows into stable, positive income. So what’s stopping any stenographic court reporting firm from getting out there and educating investors on our field? The time and drive to do it. Maybe for some people, they just haven’t had that idea yet. But that’s where we’re headed. I have little doubt that if we compete, we will win. But we have to get people in that mindset. So if you know somebody with that entrepreneurial spirit, maybe pass them this post and get them thinking about whether they’d like to seek investors to grow their firm and reach. Business 101 is that a dollar today is more valuable than a dollar tomorrow. That means our field can be extremely attractive to value investors and be a safe haven from the gambling money being supplied to “tech’s” habitual promisors.

Know a great reporting or captioning firm that needs a spotlight? Feel free to write me or comment about them below. I’ll start us off. Steno Captions, LLC launched off recently without doing the investor dance. That’s the kind of promise this field has. I wish them a lot of luck and success in managing clients and training writers.

Trolls and You

We try to keep political stuff from being published here unless it’s educational, about court reporting, or about the industry. I’ve been pretty good about this. Commentators have been great about it. The occasional guest writer has been amazing with it. This topic touches with politics, but it’s not strictly political, so it should be fun to learn about.

It’s established that the United Kingdom, United States, China, Russia and several other countries view the internet as, more or less, another theater of war. They’ve had operatives and people hired to create fake posts, false comments, and advance the interests and ideas of the government. The prices reported? Eight dollars for a social media post, $100 for ten comments, and $65 for contacting a media source. In the case of China, they’re reportedly working for less than a dollar. If the host country allows it, you have trolls for hire.

So in the context of stenography and the court reporting industry, seems like whenever we get into the news, there are regular comments from regular people, such as “why not just record it?” Typical question. Anyone would ask this question. There are fun comments like “Christopher Day the stenographer looks like he belongs on an episode of Jeopardy.” Then there are comments that go above and beyond that. They make claims like — well, just take a look.

“…I gonna tell you that in modern technology we can record something like court testimony for hundreds of years back very easily…” “…the technology is smarter every single second…” “…if you store data in the digital format we can use an AI to extract the word from the voice in the data, it will be very accurate so much so the stenographer loses their jobs.” Wow! Lose our jobs? I felt that in my heart! Almost like it was designed to hurt a stenographer’s feelings. Right?

We can store the video for hundreds of years? Maybe. But consider that text files, no matter what way you swing it, are ten times smaller than audio files. They can be thousands of times smaller than video files. Take whatever your local court is paying for storage today and multiply that by 8,000. Unless we want a court system that is funded by advertisements a la Youtube, the taxpayer will be forced to cough up much more money than they are today. That’s just storing stuff.

The technology is getting smarter every second? No, not really. Whenever it’s analyzed by anybody who isn’t selling it, it’s actually pretty dumb and has been that way for a while. Take Wade Roush’s May 2020 article in the Scientific American (pg 24). “But accuracy is another matter. In 2016 a team at Microsoft Research announced that it had trained its machine-learning algorithms to transcribe speech from a standard corpus of recordings with record-high 94 percent accuracy. Professional human transcriptionists performed no better than the program in Microsoft’s tests, which led media outlets to celebrate the arrival of ‘parity’ between humans and software in speech recognition.”

“…And four years after that breakthrough, services such as Temi still claim no better than 95 percent — and then only for recordings of clear, unaccented speech.” Roush concludes, in part, “ASR systems may never reach 100 percent accuracy…” So technology isn’t getting smarter every second. It’s not even getting smarter every half decade at this point.

“…we can use an AI to extract the word from the voice in the data…” This technology exists, kind of, but perfecting it would be like perfecting speech recognition. Nobody’s watching 500 hours of video to see if it accurately returns every instance of a word. Ultimately, you’re paying for the computer’s best guess. Sometimes that’ll be pretty good. Sometimes you won’t find the droid you’re looking for.

Conclusion? This person’s probably not in the media transcoding industry, probably doesn’t know what they’re talking about, and is in all likelihood a troll. Were they paid to make that comment? We don’t know. But I think it’s time to realize that marketplaces are ripe for deception and propaganda. So when you see especially mean, hateful, targeted comments, understand that there’s some chance that the person writing the comment doesn’t live in the same country as you and doesn’t actually care about the topic they’re writing about. There’s some chance that person was paid to spread an opinion or an idea. Realizing this gives us power to question what these folks are saying and be agents of truth in these online communities. Always ignoring trolling leads to trolling leading the conversation. So dropping the occasional polite counterview when you see an obvious troll can make a real impact on perception. The positive perception of consumers and the public is what keeps steno in business.

The best part of all this? You can rest easier knowing some of those hateful things you see online about issues you care about are just hired thugs trying to divide us. If a comment is designed to hurt you, you might just be talking to a Russian operative.

Addendum:

I understand readers will be met with the Scientific American paywall. I would open myself up to copyright problems to display the entire article here. If you’d like to speak out against the abject tyranny of paywalls, give me money! I’m kidding.

What Verbit Leadership Needs To Know

I had a lot of fun writing the Verbit investors article. But the more I explore opinions and ideas outside our steno social circles, the more I see that most people totally don’t get stenographers or the work we put in. A lot of us have had sleepless nights trying to get a daily out, time lost for ourselves or our families trying to do the job we signed up for, or some amount of stress from someone involved with the proceeding being unhelpful or antagonistic. It happens, we take it in stride, and we make the job look easy. So it doesn’t surprise me very much when people say “why not just record it?” It doesn’t surprise me that investors threw money into the idea that technology could disrupt the court reporting market. But I can only hope that proponents of digital really take the time to understand and step back from the cliff they’re being pushed towards.

For this exercise, we’re going to be exploring Verbit’s own materials. They recently circulated a graphic that showed the “penetration” of digital into the court reporting market. It shows 5 to 10 percent of the deposition market taken by digital, and 65 to 75 percent of the court market taken by digital. It also notes that only 25 to 35 percent of courts are digitally transcribed. I take this to mean that while they have 75 percent of the “court market,” they only transcribe about 25 percent of it. This is a massive problem. So the technology, when it’s not breaking down in the middle of court (29:20), is ready to record all these proceedings. But you only have the capacity to transcribe about a third of that. So in this magical world where suddenly you have every deposition, EUO, and court proceeding, where are you going to get all of these people? We’re talking about multiplying your current workforce by 28 assuming that every person you hire is as efficient as a stenographer. And the math shows that every stenographer is about as efficient as 2 to 6 transcribers. So we’re really talking about multiplying your current workforce by 56 to 168 times, or just creating larger backlogs than exist today. By not using stenographers, Verbit and digital proponents are setting themselves up for an epic headache.

Of course, this is met with, “well, there’s a stenographer shortage.” But what you have to understand is that we’ve known that for seven years now. All kinds of things have happened since then. You’ve got Project Steno, Open Steno, StenoKey, A to Z, Allison Hall reportedly getting over a dozen school programs going. Then you have lots of people just out there promoting or talking about the field through podcasts, TV, and other news. Showcasing the shortage and stenography has brought renewed interest in this field, and we are on track to solve this. Again, under the current plan, you would need as many as 60,000 transcribers just to fill our gap, and the turnover will probably be high because the plan promotes using a workforce that does not require a lot of training. So if you’re talking about training and retraining 60,000 people again and again over the next decade, I am quite sure you can find 10,000 or so people who want to be stenographic court reporters.

Look, I get it, nobody goes into business without being an optimist. But trying to upend a field with technology that doesn’t exist yet is just a frightening waste of investor money. How come when you sell ASR, it’s 99 percent accurate, but when Stanford studies the ASR from the largest companies in the world, it’s 60 to 80 percent accurate? How come when you sell digital it’s allegedly cheaper and better, but when it’s looked at objectively it’s more expensive and comes with “numerous gaps and missing testimony?” These are the burning questions you are faced with. There’s an objectively easier way of partnering with and hiring stenographers. If you don’t, you’re looking at filling a gap of 10,000 with 60,000 people, or multiplying the current transcription workforce of 50,000 by 56 (2.8 million). In a world of just numbers, this sounds great. Three million jobs? Who wouldn’t want that? But not far into this experiment you’ll find that people don’t grow on trees and the price of the labor will skyrocket unless you offshore all of the work. What happens when attorneys catch onto the fact that everything is being offshored and start challenging transcripts? Does anyone believe that someone in Manila is going to honor subpoenas from New York? Again, epic headache.

So if I could get just one message out to Verbit leadership and all the people begging for us to “just accept technology,” it would be to really re-examine your numbers and your tech. The people under you are going to tell you that a new breakthrough is just around the corner, that things are going well, and that you shouldn’t worry. But you should worry, because you very well might find yourself a pariah in your industry like Peter Molyneux ended up in his. If you’re not familiar, Peter became famous for promising without delivering. One of the most prominent examples of this was 2009 E3, where he stood up on stage and introduced Milo. This tech was going to be interactive. It was going to sense what you were doing and respond to it. It turns out it was heavily scripted, the technology did not and still does not exist to do what was being talked about and presented to consumers. Now, anyone with a bit of sense doesn’t listen to Peter.

If the ASR tech worked, why not sell it to us at 10,000 a pop multiplied by the 25,000 stenographers in your graphic and walk away with a cool 250 million dollars? It does what we do, right? So why aren’t we using it? Why aren’t you marketing it to us? It’s got to be a hell of a lot easier to convince 25,000 stenographers than it is to convince 1.3 million lawyers. Sooner or later, Legal Tech News and all the other news people are going to pick up on the fact that what you are selling is hype and hope. So, again, consider a change of direction. Stop propping up STTI, shoot some money over to the organizations that promote stenography, and partner up with steno. You’d be absolutely amazed how short people’s memories are when you’re not advocating for their jobs to be replaced with inferior tech. Take it from somebody who’s done the sleepless nights and endless hours in front of a monitor transcribing, this business isn’t easy. But if you trust stenographers, we’re going to keep making it look easy, and we’re going to make every pro-steno company a lot of money.

What Verbit Investors Need To Know

I had touched pretty gently on Verbit when its series A funding came in at $23 million. The series B funding is in at about $31 million earlier this year. Now Verbit’s announced a strategic partnership with the STI and professional flip flopper, Jim Cudahy. Migliore & Associates already came out with the hard truth of what this means: ASR doesn’t make the cut for the production of legal transcripts without a qualified court reporter no matter what you name it, NLP, ASR, AI, computer magic, automated transcription.

Do I come off as angry? I am angry. I’m angry that investors are being led down a path of burning capital where there’s just not a bright future. When the series A funding was happening, Verbit used words like automated, “save an enormous amount of manual labor.” “Adaptive speech recognition” with over 99 percent accuracy. Series B is out. They “would not take the human transcriber out,” “the AI will enhance the human.” So investors are fundamentally paying millions of dollars so that they can be another Rev. I doubt very much that that’s what was sold to investors. I don’t think anybody would be putting down millions on that.

Then the partnership with STI? A complete joke. I have already gone into how, without any doubt, stenographers and NCRA are by far the best equipped to deal with the court reporter shortage. AAERT and the STI just don’t have the funding, infrastructure, or experience to tackle the problem, and it shows in their data. By their estimates, court reporting companies stand to save $250,000 over the next decade by adopting digital tools. First, I would love to know if this is individual savings or cumulative. We don’t know because there are no sources linked or cited. If this is cumulative, it’s embarrassing that they would even post that. That would mean 25,000 in savings a year across all companies. If that’s the projected individual savings per company, only slightly less embarrassing. That’s less than the average annual salary of a single court reporter. This may come as a shock to Jim Cudahy, but court reporting companies adopted digital tools throughout digital’s birth in the 70s and into the 80s and 90s. Stenographers are already a part of the Information Age, utilize AI, and produce quality records daily. The idea that investors are going to dump $50 million into “technology” expected to save $250,000 over 10 years and expect a return is terrifying. “Most courts are digital,” again, assuming everything they have to say is true, and yet judicial candidates show a preference for stenographic court reporters and returning them to courtrooms. The growth here is in stenographers, stenography jobs, and stenography schools, and Verbit’s current leadership is missing this boat completely.

Let’s just tell it like it is. When a grassroots-funded stenography blog can give you some pretty solid reasons you’re backing the wrong horse, it’s time to give investors nothing less than what they deserve. Open up a Steno Department, throw down some money on us, and we will make sure you’ve got real and steady returns. Verbit, with proper leadership from Tom Livne, can still save the day. Just not with this bait and switch technology-to-transcription model that amounts to little more than a repackaging of old tech. The only other viable alternative I see is buying this blog for a good $8 million and hoping investors don’t see it before then. Not a difficult decision. Come on over to the winning team. Vote for sten!

Buying Hype

Seems like every day now there’s a new article talking about the great advances of AI transcription. Notice in what I just linked, the author is “Wire Contributor,” which to me means that it’s probably a Trint employee. The September 2019 article goes on to link an April 2017 article where the Wire apparently said something they did was unprecedented.

If you’re not looking at dates and glancing over it, it looks like AI transcription is making leaps and bounds. It’s coming. Their app is to be released at the end of 2019! What will we do? I am here to hopefully get everyone thinking critically. Why are these articles always sporting a technology that’s critically acclaimed but not ready to be publicly released? Because it’s a pitch. It’s an effort to get more investors. It’s a bid to get more people to throw money at it.

Not to get too controversial, but I’ve long watched a YouTuber scientist named ThunderF00t (Phil Mason). He’s made many videos to raise consumer awareness on products including inventions like the Free Electric, Solar Roadways, Zero Breeze, Fontus. All of these amazing things have a common theme: They sound cool. The media doesn’t understand the concepts behind them. Their creators make positive claims about them. These inventions have had millions of dollars put into them only for kickstarters and stakeholders to be let down. This is despite walls of positive press from various sites and media forums.

What can we learn? Sellers sell. That’s what they do. When there’s millions of dollars to be made, does the seller really care if the product only meets 90, 80, or 70 percent of the buyer’s needs? Will most buyers spend more time and money holding the seller accountable, or will they eat the loss or attempt to justify the purchase to themselves? That’s why you see claim after claim and never a bad word unless you have colossal levels of fraud, like Theranos. What else can we learn? These things can raise millions of dollars and never hurt a market, Solar Roadways raised over a million dollars and never threatened existing energy companies.

Buying hype can only serve to dampen our morale and make us cede market share. It can only serve to silence us. You don’t have to be a computer scientist to investigate claims about computer science. Let’s start selling facts and raising consumer awareness. If nothing else, remember: If their product worked, you would be using it.

Can Verbit Replace Verbatim?

I had had some thoughts with regard to AI and stenography. I stand by what I said there. Verbit has been, according to online commentators, soliciting people’s business and offering to assist with their workload. There are even some who have said — though I have not seen documentary proof of this — that Veritext is using Verbit or a similar process for their digital reporters. Succinctly, running the audio through a computer program and having a human fix up what the computer spits out. Oddly enough, sounds a lot like what we do when we are taking down every word on a stenotype these days.

The bottom line is these companies are hungry for money. They need revenue to prove to their investors that they are a good investment. Verbit reportedly raised $23 million. Trint reportedly raised at least 150 million euros, or 168 million dollars. That should give you an idea of just how big of an expense it is — in their estimation — to create a program to do what we already do.

When we talk about solving problems, and specifically solving problems with AI and computers, two of the largest jumps in technology are machine learning and modeling the human brain. Modeling the human brain seems an arduous task that is difficult to do on modern hardware. Machine learning is giving the computer training data, and then having the computer make “educated” guesses based on the training data.

So why bring this up again? Well, to caution all of us. The simple truth is the more training data that you give these folks, meaning the more audio files they have that show the computer what we do, the more they’ll be able to sculpt the program. If you make the business decision to help them in that way, that’s fine. But you know what? Demand a premium! There are hundreds of millions of dollars involved in developing these computer programs right now. They should probably be paying YOU to transcribe YOUR work, because quite frankly, if they perfect the program, you might be out of business. If they haven’t yet perfected the program, you’re helping them perfect it! Sounds like a premium service to me.

So make sure everybody out there knows: They don’t want your business, they need it, and they should probably be paying you.

A Word on AI and Stenography

I’ve said this before, but it feels like AI is ubiquitous and in everything these days. It spreads a lot of bad press for us stenographers in that people believe we are or will soon be replaceable. We can further extrapolate from the Pygmalion effect that those beliefs impact reality.

As many know, I’m an amateur programmer. I know relatively little about the top-of-the-line tech and can only code on a very basic level. That said, the more I learn conceptually, the more I’m in awe of just how far computers have come, and how far they have to go. You see it every day on your smartphone and in your steno software. Computers are hard at work and designed to do amazing things.

Here is the thing about computers: They only do what you tell them to do. You have to come up with a set of instructions, an algorithm, that gets it from point A to point B. They solve problems, but only using the instructions you give them. Even if you come up with the instructions, the results can be useless. We can imagine problems as mathematically solvable and insolvable — finite or infinite. An example of an infinite problem is a Fibonacci sequence. You take the next number in the sequence, and you add it to the last number in the sequence. This stretches into infinity. You can easily write a program to generate Fibonacci numbers, and the computer would die before generating them all because there are infinite numbers.

Then there are solvable problems. Chess is considered a solvable problem because it is a game with a finite number of pieces, spaces, and moves. There’s a problem, though. There are so many moves in chess that just the datasets for having 7 pieces at the end of the game (Lomonosov tablebases) are said to be 140 terabytes of information. To put that into perspective, it’s been estimated that all the books in the world would fit on about 60 terabytes. Even if you had a supercomputer capable of generating every possible move in Chess, the information would be absolutely useless to you, because to digest all of it would be the equivalent of reading every book ever written thousands of times.

So let’s think of AI and audio in terms of problem solving. The most basic way to describe Alexa and Siri is that they listen to you for keywords, and check what you say against their database, and decide what to do based on that algorithm we talked about. Let’s face it, there are only maybe 200,000 words in the English language. You could store every single one as a large audio file with less than 700 GB. Here is the deal: computers don’t hear in the traditional sense. They’re taking what you say and presenting educated guesses based on all the data they have. So now, if you will, imagine all 200,000 English language words and every combination they could possibly be in. To put it in perspective, it is a way bigger number than this. Now let’s add all the different ways words might be said, or all the different scenarios that might interfere with how the computer is “hearing.” Let’s add all the different accents and dialects of English.

Let me say this: It is very likely, in my mind, that someday computers will be programmed to hear as well as stenographers in any given situation. It’s a solvable problem. It’s a winnable game. But right now, based on what I know, there’s an indeterminate amount of time and money that it’ll take to get to a point where it is perfect and 95 percent or better in most or all scenarios in a reasonable amount of time. Take for a moment the example of Solar Roadways. Pave the roads in solar panels to solve America’s energy crisis. Millions of dollars were poured into this solution, and it failed. Remember, solvable problem, winnable game. Finite number of people with finite energy needs. Failed anyway. Speech-to-text is estimated to be worth billions of dollars. But what if it takes 100 more years to solve? How many millions or billions of dollars need to be lost before the solution is declared “good enough?” Remember, they can sell Alexa and Dragon today for piles of money. They don’t need 95 percent. The exponential growth of computers has ended, and unless the experts bring us quantum computing or some other huge leap in technology, we’re looking at computers being more money to upgrade.

Those companies you see that are touting transcription AI in 2019 are doubtlessly having transcribers fix AI-prepared transcripts at best. Their game is psychological. It’s not cost saving, it’s cost shifting from the worker to the boss. That’s why it’s not being sold to the public. It’s a magic trick. Look to the left while the magician rolls the coin to the right. It is in our best interest as stenographers to call this out when appropriate, and continue to bolster our own magic skills and industry as the go-to for the hearing impaired and legal communities. Could some geniuses come along and program your replacement next year? Sure. But one thing that you should understand is that it’s not very likely, and buying the hype before they have a product to sell is only going to hurt our morale and livelihoods. We have our method. We have a product. We’ve got more brains, voters, and history in the field. So do yourself and all of us a favor, don’t buy the hype, and the next time you meet a transcriber working for Fake AI Transcription Corp, LLC, tell them they can double their earnings and better themselves by joining the stenographic legion. If a supercomputer is required to solve Chess, what do you believe is required to get automatic speech recognition to 95 percent?

May 26, 2019 Edit:
I should add that it’s obvious computers are becoming ruthlessly good at transcribing one speaker, especially in a closed or suitable environment. There are hours of video on that. It’s introduction of multiple speakers in a less-than-perfect environment where the thing struggles, probably because of all those mathematical issues talked about above.

June 18, 2019 Edit:

A post recently made its rounds on social media claiming a computer science PhD couldn’t see the perfect transcription coming out any time soon. It stands in stark contrast to the claims of some that the technology is already perfect.

August 17, 2019 Edit:

Another article came to light showing that Facebook Messenger and other automatic transcription apps are actually using human transcribers behind the scenes. Using my amateur knowledge of computer coding, I can say this is clear evidence that they need data (the transcriptions) to feed into the machine learning algorithms. Further, if they’re not paying their transcribers exceptionally well and bad data is being inputted, it could ultimately make automatic transcription programs worse. Expect some pretty big delays on the AI transcription front.

August 25, 2019 Edit:

I had created a “mock voice recognition video” just to prove how easy it would be for a company to lie about its voice recognition progress. I coded a computer program that spits back whatever text you give it at a set words per minute. So next time you’re at an automatic transcription demonstration, ask yourself if what you’re seeing is automatic or staged. I often give the example of Project Natal and Peter Molyneux. Gamers were made to believe that the Milo demonstration of Project Natal was a showcase of technology that was coming out. The truth broke years later that the demonstration was heavily scripted, and over ten years later, no such technology exists. Similarly, when someone tells you that their audio transcription program is flawless — question whatever you’re seeing and realize how easy it is to stage and sell things.

Steno V Digital (Archive Post)

Consider this a gentle touch on an important topic. There’s been a true memetic shift in the way stenographers are interacting and spreading ideas. Content is honestly popping up faster than we can even really digest it, so let this post serve as a staging point for some of what’s happening this Court Reporting and Captioning Week 2019. This weekend I’ve had the pleasure of reading a flyer from the DRA in California (Photo Archive). Read about Idaho’s need for reporters (Photo Archive). Finally, got to see Cleveland Reporting Partners’ whole take on digital v steno (Photo Archive).

In very brief summary we are seeing many people put into writing what I have opined over Facebook. Yes, technology is amazing. But right now it struggles with certain things. It can transcribe one speaker quite well, but if you throw in some stray sounds or a second speaker, it can have a hard time. This makes the market for captioners and legal reporters a little more promising because we have the skill and training to give them what they need now and train others to do it. Make no mistake, there’s a big market in that, so if a company is having you train a digital, make sure you’re getting at least the next ten years of your annual income upfront.

Technological growth is no longer exponential. Don’t get me wrong, it’s impressive. But until Quantum Computing is cheap and accessible there are probably things we won’t see, like a JARVIS-like AI. We will see imitation AI, that’s for sure, but there is an indeterminate clock on when we will see quantum tech. The running idea and current study that’ll probably lead to true ideas is machine learning. This takes data training sets, like pictures, or recordings, or text — whatever it is programmed to take — and it takes that information and uses it as a basis for its decisions. Sometimes this is entertaining. Sometimes this goes horribly wrong. The bottom line is it is limited by the speed at which it can process its training data and the speed at which it can retrieve that information for later.

I imagine that the training data set for an AI to “do depositions” would look something like recorded depositions paired with their transcripts. There are three big hurdles there, building the training set, processing the training set, and retrieving the right data when it’s time to “do deposition.” In a classic computer we have, in very laypeople terms, little transistors firing on and off to tell the computer what’s going on. Tech is running into a problem where it can’t get these little nanotubes much smaller, and making bigger processors absorbs more electricity. For example, I wrote a Fibonacci-generating program. The basic concept is every number adds itself to the number that would come after it. The computer is happy to make these calculations, but very quickly, the processing power needed to calculate these numbers begins to run dry, and the files we store these numbers in become too large to be opened on a weak laptop. The simplest algorithm in existence busts up a classic computer. This is probably the trouble they have making something that can seamlessly listen to people and transcribe, the computer just doesn’t have the power to process it quickly. Look how long it takes videographers to burn disks or Go To Meeting to process audio. Now imagine adding another layer where the machine is transcribing everything perfectly. In Quantum Computing they’re talking about these very small units being able to calculate everything at once, or large batches of things at once. If they crack that, we’re probably back to exponential technological growth.

In the meantime, fight for your jobs. Fight for market share. It’s not a question of whether we’re outdated. Today the answer to that is no. What matters heavily is perception. Perception can change outcomes. One of the most effective tactics in war has been to get the enemy army to rout, and that’s exactly what digital reporting advocates are trying to get you to do: Give up and go home without a fight. Don’t buy into it, make the technology prove itself. Even the worst stenographer puts in words four or five times faster than the average typist, yet there are still typists.

Keep competing. We are well on our way to winning this thing.