Much of my writing has been built around a very serious revelation that the belief that automation will take away jobs is killing industries. I focus on court reporting, but it’s happening everywhere. People are scared to become truckers because of Elon Musk’s claims that he will automate trucking. I’ve looked to many other industries to illustrate this. For other examples, take how it was assumed Uber would take over the world but it hasn’t made any profit or how it was assumed Theranos would revolutionize blood testing and it was all a big scam. Their actions had real-world consequences. To this day, the value of taxi medallions in New York City are decimated thanks to Uber even though it is, as of yet, not a sustainable business model. Technological hype can do big damage.
And you know what I find, reading that article? They have just as much fear as us. Their AI-centered businesses can fail all the same. They can burn through $20 million in late 90s money and still walk away with no real product.
They have the same issue with charismatic figures promising or claiming things that have little or no basis in reality.
Unrealistic expectations can absolutely destroy their field. In ours, this plays out as people not believing it is a viable job. In theirs, this could play out as investors taking all that money propping them up and going somewhere else with it. This has happened before and is referred to as “AI winter.”
There’s a lot to be learned by looking directly at what’s going on in technology today. Perhaps most pressing for us is the realization that there is not some kind of magic unending growth built into technological progress. The last century, and particularly the last couple of decades, changed humanity. Technology exploded from no TV, to black and white TV, to the home entertainment centers we have today. Many of us are under a belief that technology will always grow at that pace. We are encouraged to think that not only due to our collective experience, having lived through the technological leap, but also encouraged by the people who stand to gain the most from people buying into that belief and investing into that belief.
So what we are left with is the same thing I have been writing about for years. The digital reporting stuff is not about efficiency, technology, or anything particularly new or special. It’s about worker exploitation. It’s about moving the field away from one that has a strong support system to one that has no support system or where the support system is controlled by the business owners. It’s about getting you court reporters to believe “technology magic” is taking away your job so that you don’t fight to keep it. In reality, there was a genuine attempt to shift our NCRA that way with Plan B. That failed. We got NCRA 2.0. NCRA 2.0 balanced the budget and put its members before its corporate sponsors, which only US Legal corporate reps appear to have a problem with. Since the corporate powers that be couldn’t get NCRA to kill our industry for us, they threw a tantrum and started pretending NCRA didn’t exist.
Seems like conspiracy theory territory! Except it’s no theory. Check out Benjamin Jaffe. He writes a whole article this year about how digital is the answer to our shortage. But he’s affiliated with BlueLedge.
And what is BlueLedge? A training provider for digital reporting that is basically pretending stenographers don’t exist.
To use the words of Dineen Squillante, we are being “out-marketed” rather than losing by any objective metric. The entire game on the digital reporting side has been exposed “we are going to push our version of the future. We don’t care who this hurts.” Guess what, stenographic reporters? Our nonprofits are better funded. Our social groups and support systems are bigger. Our students pools are larger and more invested. All we have left to do is acknowledge our own collective bigness and put our thumb on the scale. We need to start being very vocal about our industry and the projects we are working on.
To that end, if you have court reporting or captioning industry news and you’d like to get it reprinted, please contact me at ChristopherDay227@gmail.com. We can work out a deal where I can use the skills I have built to get your work some extra exposure, you can get your stuff in circulation, and I can use some of the profit to create more steno advertising rather than rely on the incredible generosity of donations. A price point of $200 to $400 per news event is the target. It’s calculated to keep this venture going strong and beat out the deceptive marketing from the digital camp. Even if your organization cannot meet the $200 cost, please reach out. A lack of publicity pushed our field down to where it is today. We can reverse direction there, but we won’t reverse that without a little time, effort, and togetherness.
Yesterday I noted the racial disparities in automatic speech recognition study and how modern ASR did worse than the estimates provided in an old patent. I also noted humans are built to get better at just about anything they do. I just so happen to think about this court reporting and automatic speech recognition stuff a lot. It finally hit me why automatic speech recognition has made little real progress in the last 20 years: Language drift. The way that people speak and write English tends to change over time. Great example? I’m a gamer but I’m not entrenched in gamer culture. When someone about six years younger than me said “I’m getting bodied,” I had almost no clue what he was talking about. He was getting beat up by the other team! If you took a look at the video I linked, it explains how words and nomenclature changed drastically in English. Early English, to me, sounded much more French than anything we know today. If you go back only about 650 years, you reach a point where you are unlikely to understand the English language. Giraffes used to be camelopards. “Verily” used to be a word that people used. Even worse, there was no electricity to charge our stenotypes yet. To the chagrin of English purists, language drift appears inevitable. But this is also why we need real people studying and mastering English. It gives the rest of us a fighting chance. That’s why a computer program could never do for court reporting what Margie Wakeman Wells did. The computer would only regurgitate the same rules again and again, never reviewing or assessing new information unless a real person told it to.
What does that have to do with automatic speech recognition and court reporting? Our verbal and written languages are changing over time. That’s why literally now means figuratively, literally! ASR is based off of machine learning. It’s unlikely to ever perfect English because English is ever evolving and never perfect. Let’s say a company compiles enough data and creates an algorithm so perfect that it can accurately understand every single one of the billions of speakers on the planet today. Every single day after that moment, the speech patterns would change just a little bit and would be unrecognizable to the system someday. Of course, there is not a single country or corporation on the planet allocating enough money or personnel to gather that much data in the first place!
As a secondary matter, a system trained to understand all English dialects is inherently less likely to work than a system trained to understand only standard English as far as I know. I’ve written extensively about how bad ASR was with AAVE, as low as 25%. If we train a system for AAVE and data suited for that, there is a high likelihood that it would have worse accuracy for standard speakers. Gain ground on one type of speaker and lose ground on the other. The main way to compensate for that would be to have a trained operator use a specific voice profile to select the speaker. Guess what? That’s voice writing, something our industry figured out two decades ago.
This is not to say we shouldn’t continue to train and be at the top of our game. But my thoughts on AI are shifting from what they were. I used to believe there was some small possibility we would be replaced. I am coming to a place where I do not see us as replaceable under the current model of ASR without a trained operator in every seat. If we’re going to do that, stenography is the way to go!
Thank you to recent donors. My PayPal is open to receive donations for those that wish to contribute to the cost of running the blog. If you don’t want to give something for “nothing,” I also designed a Sad Iron Stenographer mug on Zazzle. The cheaper one, I will make about $0.90 for every sale. The more expensive one, I will make about $10 for every sale. They are both identical mugs, so buy whichever you find to be more appropriate. Nothing will make your Mondays happier than the sad iron stenographer, I guarantee* it.
In our field there are three main modalities for taking the record or captioning. There is stenography, voice writing, and digital recording. Stenography is using a chorded stenotype and computer dictionary to instantaneously take down and transcribe the spoken word. Digital recording is all about letting a microphone pick up the audio and having somebody transcribe it after the fact. Sometimes digital recording proponents insist that they can run the audio through automatic speech recognition (ASR) systems to “assist the transcriber.” I’ve been pretty open about my feelings there.
There are also nonprofits representing each modality. NCRA is all-in for steno. NVRA admits stenographers, but in my mind is really more for voice writers, and rightfully so. AAERT is pro-recording. ATSP is pro-transcriber to the extent it has any court reporting industry presence. There are others like Global Alliance or STTI that claim to be for all three modalities, but I’ve always gotten a “jack of all trades, master of none” vibe from those types of associations.
From information available to me, I believe that NCRA is by far the largest organization and in the best position to handle the court reporter shortage, but NVRA does provide an incredibly important role in certifying voice writers. One common problem in the early years of voice writing, which some New York attorneys still hold against them, was that occasionally they could be heard through the mask. Even now, when there is a lot of sibilance, one can infrequently hear a voice writer through the mask. Modern certification requires that the voice writer is able to perform without being heard, and a two-strike policy is employed in which the first time a writer is heard during a test they are tapped on the shoulder. The second time they are heard, they are disqualified. Voice writing tests, like ours, give the voice writer one shot at getting their “voice notes” correct. They are not allowed to repeat or review the test audio. This kind of testing is important and represents the quality standards this industry needs. NVRA confirmed its testing policy in an 8/11/21 e-mail to me.
Most reporters know that voice writing is, at its core, speaking into a Stenomask or other voice mask and allowing automatic speech recognition to assist in the transcription of what’s said. In some settings, a voice writer may use an open mic. Some stenographic reporters may be surprised to learn that realtime voice writing is superior to digital reporting and general ASR use. In general ASR use, the microphone takes input from everyone and the computer system gives its best guess based on the training data it has. In a study from last year, it was shown that that technology’s accuracy could drop as low as 25% dependent on who is speaking. Realtime voice writing, by comparison, is a trained operator, the voice writer, often speaking into a closed microphone, and utilizing ASR that has been trained to that writer’s voice. In the best of circumstances, that ASR can reliably put out highly accurate transcriptions of the voice writer’s voice — as high as 98%. Many realtime voice writers utilize Dragon by Nuance connected to their preferred CAT software. I guesstimate that Nuance has the best ASR tech, and there’s no coincidence that despite all the other ASR vendors out there, Nuance is the one Microsoft wanted to buy. This lead in technology comes from the system being trained to understand the specific user or voice writer.
One important distinction is the difference between realtime voice writers and voice writers that speak into the mask and have someone else transcribe and do the work. This is very similar to the divide in stenographic reporting where some scopists report having to fill in huge chunks of information missed by the court reporter. A realtime voice writer, like a realtime stenographer, does not have to provide realtime services, but they do maintain the equipment and capability to do so.
The knowledge and preparedness of the voice writer is integral to the integrity of the record produced. Think of all the glitches and anomalies in stenographic CAT software. Think about how reporters create macros and dictionary workarounds every day to deal with them. As an easy example, my software does not like certain punctuation marks to be together. Early in my career, I worked out that placing a backslash between the two marks and then deleting it would override the software’s programming to delete punctuation. Similarly, voice writers have to deal with the complexities of the ASR system, the CAT software, and how they interact in order to overcome word boundary and formatting issues.
The understanding and maintenance of a voice writer’s equipment is also paramount. How the computer “hears” a writer’s voice in one microphone can be vastly different than another microphone. Different masks can be given different training configurations to enhance the ASR transcription. Voice writers are speaking into a mask, and where saliva or liquid gets into the mask it can alter what the computer hears. The competent voice writer monitors their realtime and keeps redundant equipment in case of an equipment failure, including extra masks and multiple audio backups of their “voice notes.” As someone who keeps two stenotypes in case one decides to die mid-trial, I admire the voice writers that take the time to ensure the show goes on in the event of computer problems.
Like us, there are many briefs or triggers voice writers use. The key difference is that they must speak the “steno.” The same way we must come up with a stroke for designating a speaker, they must come up with a voice command. The same way that stenographers must differentiate the word “period” from the punctuation symbol of a period, voice writers historically had to create differentiations. For example, in years gone by, they might have had to say “peerk” for the symbol and “period” for the word. Modern ASR systems are sometimes able to differentiate the word versus the mark without any special command or input from the voice writer! Again, the experience and ability to predict how the software will interpret what is said is an important skill for the realtime voice writer.
The obvious question arises as to why this blog tends to be silent on voice writing. There’s no overt hostility there and deep admiration for the people at the top of the voice writing modality of record taking. Simply put, I truly believe that stenographic reporting is better and will open more doors for students. That’s colored by my own experiences. As of today, voice writers are not allowed to work in my court and be in my civil service title. We can argue about whether they should be allowed, but the simple fact is that New York courts today tend to utilize stenographic reporting or digital recording. It’s easy to see that the qualified voice writer is a far better choice than the digital recording, but I couldn’t say to a student “get into voice writing! You’ll have the same opportunities as I do!”
There is a tumultuous history between stenographic court reporters and voice writers. I’ve been told by multiple NCRA members that when an effort was made to include voice writers about two decades ago, there was heavy backlash and even some harassment that occurred against those that were pro-integration. That was the climate of yesterday. While it seems unlikely that there will be formal alliance, inclusion, or cooperation, the separation we see today is not the same violent rejection of voice writers from the early 2000s. The civility of NCRA’s 2021 business meeting showed that court reporters are ready to disagree without belligerence and keep our industry moving forward. This is more akin to why the North American Olive Oil Association probably doesn’t partner much with the Global Organization for EPA and DHA Omega-3s. Olive oil and fish oil are both fine oils, but every second and cent spent advocating for one could be spent advocating for the other. It doesn’t make much sense to divide the time and resources. That’s where we are today. What the future holds for tomorrow, I can only imagine.
A big thank you to everyone that made this article possible, up to and including the NVRA. One source of my information was the esteemed Tori Pittman. Trained in both stenography and voice writing, Tori gave me a full demonstration of voice writing and agreed to speak at length about voice writing. See the full interview below!
A series of 2019 predictions by Gartner were reported on by Venture Beat on June 28, 2021. As explained in a priorpost, “AI”, or machine learning, relies on datasets and algorithms. If the data is imperfect or incomplete, a computer has a chance of giving bad output. If the algorithm that tells the computer what to do with the data is imperfect, the computer has a chance of giving bad output. It’s easy to point to anecdotal cases where “AI” makes a bad call. There have been reports of discrimination in facial recognition technology, driverless cars killing people, or Amazon’s algorithm deciding to fire drivers that are doing their job. I’ve seen plenty of data on the failings of overhyped technology and commercial ASR. What I hadn’t seen prior to today was somebody willing to put a number on the percentage of AI solutions that succeed. Today, we have that number, and it’s an abysmal 15%.
Perhaps this will not come as a surprise to my readers, considering prior reports that automatic speech recognition (ASR), an example of machine learning, is only 25 to 80 percent accurate depending on who’s speaking. But it will certainly come as a surprise to investors and companies that are dumping money into these technologies. Now there’s a hard number to consider. And that 15% itself is misleading. It’s a snapshot of the total number of implementations, not just ASR. ASR comprises a percentage of the total number of implementations out there. And it’s so bad that some blogs are starting to claim word error rate isn’t really that important.
That 15% is also misleading in that it’s talking about solutions that are implemented successfully. It is not talking about implementations that provide a positive return on investment (ROI). So imagine having to go to investors and say “our AI product was implemented with 100% success, but there’s still no money in this.”
The Venture Beat article goes on to describe several ways to make AI implementation a success, and I think it’s worth examining them briefly here.
Customizing a solution for each environment. No doubt that modeling a solution for every single business individually is bound to make that solution more successful, but it’s also going to take more staff and money. This would be almost like every court reporting company having their own personal software development staff to build their own CaseCAT or Eclipse. Why don’t they do that? It’s hopelessly expensive.
Using a robust and scalable platform. The word robust doesn’t really mean anything in this context. Scalability is tied to modular design — the ability to swap out parts of the program that don’t work for specific situations. For this, you need somebody bright and forward thinking. They have to have the capability to design something that can be modified to handle situations they may not even be aware exist. With the average software engineer commanding in the ballpark of $90,000 a year and the best of them making over $1 million a year, it’s hopelessly expensive.
Staying on course once in production. This involves reevaluating and sticking with something that may appear to be dysfunctional. This would be almost like the court reporter coming to the job, botching the transcript, and the client going “yes, I think I’ll use that guy again so that I can get a fuller picture of my operational needs.” It’s a customer service nightmare.
Adding new AI use cases over time. Piggybacking on number 3, who is going to want to continue to use AI solutions to patch what the first solution fails to address? This is basically asking businesspeople to trust that it will all work out while they burn money and spend lots of time putting out the fire. It’s a customer service nightmare.
I really respect Venture Beat trying to keep positive about AI in business, even if it’s a hopelessly expensive customer service nightmare.
With some mirth, I have to point out to those in the field that believe the stenographer shortage is an insurmountable problem that we now know machine learning in the business world has a failure rate that’s right up there with stenographic education’s failure rate. Beyond the potential of exploiting digital reporters or stealing investor money, what makes this path preferable to the one that has worked for the last hundred years? As I wrote a week ago, the competition is going to wise up. Stenographic court reporters are the sustainable business model in this field, and to continue to pretend otherwise is nothing short of fraud.
There’s a lot of conjecture when it comes to automatic speech recognition (ASR) and its ability to replace the stenographic reporter or captioner. You may also see ASR referred to as NLP or natural language processing. An important piece of the puzzle is understanding the basics behind artificial intelligence and how complex problems are solved. This can be confusing for reporters because in any of the literature on the topic, there are words and concepts that we simply have a weak grasp on. I’m going to tackle some of that today. In brief, computer programmers are problem solvers. They utilize datasets and algorithms to solve problems.
What is an algorithm?
An algorithm is a set of instructions that tell a computer what to do. You can also think of it as computer code for this discussion. To keep things simple, computers must have things broken down logically for them. Think of it like a recipe. For example, let’s look at a very simple algorithm written in the Python 3 language:
Line one tells the computer to put the words “The stenographer is _.” on the screen. Line two creates something called a Stenographer, and the Stenographer is equal to whatever you type in. If you input the word awesome with a lowercase or uppercase “a” the computer will tell you that you are right. If you input anything else, it will tell you the correct answer was awesome. Again, think of an algorithm like a recipe. The computer is told what to do with the information or ingredients it is given.
What is a dataset?
A dataset is a collection of information. In the context of machine learning, it is a collection that is put into the computer. An algorithm then tells the computer what to do with that information. Datasets will look very different dependent on the problem that a computer programmer is trying to solve. As an example, for enhancing facial recognition, datasets may be comprised of pictures. A dataset may be a wide range of photos labeled “face” or “not face.” The algorithm might tell the computer to compare millions of pictures. After doing that, the computer has a much better idea of what faces “look like.”
What is machine learning?
As demonstrated above, algorithms can be very simple steps that a computer goes through. Algorithms can also be incredibly complex math equations that help a computer analyze datasets and decide what to do with similar data in the future. One issue that comes up with any complex problem is that no dataset is perfect. For example, with regard to facial recognition, there have been situations with almost 100 percent accuracy with lighter male faces and only 80 percent accuracy with darker female faces. There are two major ways this can happen. One, the algorithm may not accurately instruct the computer on how to handle the differences between a “lighter male” face and a “darker female” face. Two, the dataset may not equally represent all faces. If the dataset has more “lighter male” faces in this example, then the computer will get more practice identifying those faces, and will not be as good at identifying other faces, even if the algorithm is perfect.
Artificial intelligence / AI / voice recognition, for purposes of this discussion, are all synonymous with each other and with machine learning. The computer is not making decisions for itself, like you see in the movies, it is being fed lots of data and using that to make future decisions.
Why Voice Recognition Isn’t Perfect and May Never Be
Computers “hear” sound by taking the air pressure from a noise into a microphone and converting that to electronic signals or instructions so that it can be played back through a speaker. A dataset for audio recognition might look something like a clip of someone speaking paired with the words that are spoken. There are many factors that complicate this. Datasets might be focused on speakers that speak in a grammatically correct fashion. Datasets might focus on a specific demographic. Datasets might focus on a specific topic. Datasets might focus on audio that does not have background noises. Creating a dataset that accurately reflects every type of speaker in every environment, and an algorithm that tells the computer what to do with it, is very hard. “Training” the computer on imperfect datasets can result in a word error rate of up to 75 percent.
This technology is not new. There is a patent from 2000 that seems to be a design for audio and stenographic transcription to be fed to a “data center.” That patent was assigned to Nuance Communications, the owner of Dragon, in 2009. From the documents, as I interpret them, it was thought that 20 to 30 hours of training could result in 92 percent accuracy. One thing is clear: as far back as 2000, 92 percent accuracy was in the realm of possibility. As recently as April 2020, the data studied from Apple, IBM, Google, Amazon, and Microsoft was 65 to 80 percent accuracy. Assuming, from Microsoft’s intention to purchase Nuance for $20 billion, that Nuance is the best voice recognition on the market today, there’s still zero reason to believe that Nuance’s technology is comparable to court reporter accuracy. Nuance Communications was founded in 1992. Verbit was founded in 2016. If the new kid on the block seriously believes it has a chance of competing, and it seems to, that’s a pretty good indicator that Nuance’s lead is tenuous, if it exists at all. There’s a list of problems for automation of speech recognition, and even though computer programmers are brilliant people, there’s no guarantee any of them will be “perfectly solved.” Dragon trains to a person’s voice to get its high level of accuracy. It simply would not make economic sense to have hours of training a software to everyone who is going to speak in court forever until the end of time, and the process would be susceptible to sabotage or mistake if it was unmonitored and/or self-guided (AKA cheap).
This is all why legal reporting needs the human element. We are able to understand context and make decisions even when we have no prior experience with a situation. Think of all the times you’ve heard a qualified stenographer, videographer, or voice writer say “in 30 years, I’ve neverseen that.” For us, it’s just something that happens, and we handle whatever the situation is. For a computer that has never been trained with the right dataset, it’s catastrophic. It’s easy, now, to see why even AI proponents like Tom Livne have said that they will not remove the human element.
Why Learning About Machine Learning Is Important For Court Reporters
Machine learning, or applications fueled by machine learning, are very likely to become part of our stenographic software. If you don’t believe me, just read this snippet about Advantage Software’s Eclipse AI Boost.
If you’ve been following along, you’ve probably figured out, and it pretty much lays it out here, that datasets are needed to train “AI.” There are a few somewhat technical questions that stenographic reporters will probably want answered at some point:
Is this technology really sending your audio up to the Cloud and Google?
Is Google’s transcription reliable?
How securely is the information being sent?
Is the reporter’s transcription also being sent up to the Cloud and Google?
The reasons for answering?
The sensitive nature of some of our work may make it unsuitable for being uploaded. To the extent stuff may be confidential, privileged, or ex parte, court reporters and their clients may simply not want the audio to go anywhere.
Again, as shown in “Racial disparities in automated speech recognition” by Allison Koenecke, et al., Google’s ASR word error rate can be as high as 30 percent. Having to fix 30 percent of a job is a frightening possibility that could be more a hindrance than a help. I’m a pretty average reporter, and if I don’t do any defining on a job, I only have to fix 2 to 10 percent of any given job.
If we assume that everyone is fine with the audio being sent to the cloud, we must still question the security of the information. I assume that the best encryption possible would be in use, so this would be a minor issue.
The reporter’s transcription carries not only all the same confidential information discussed in point 1, but also would provide helpful data to make the AI better. Reporters will have to decide whether they want to help improve this technology for free. If the reporter’s transcription is not sent up with the audio, then the audio would only ostensibly be useful if human transcribers went through the audio, similar to what Facebook was caught doing two years ago. Do we want outside transcribers having access to this data?
Our technological competence changes how well we serve our clients. Nobody reading this needs to become a computer genius, but being generally aware of how these things work and some of the material out there can only benefit reporters. In one of my first posts about AI, I alluded to the fact that just because a problem is solvable does not mean it will be solved. I didn’t have any of the data I have today to assure me that my guess was correct. But I saw how tech news was demoralizing my fellow stenographers, and I called it as I saw it even though I risked looking like an idiot.
It’s my hope that reporters can similarly let go of fear and start to pick apart the truth about what’s being sold to them. Talk to each other about this stuff, pros and cons. My personal view, at this point, is that a lot of these salespeople saw a field with a large percentage of women sitting on a nice chunk of the “$30 billion” transcription industry, and assumed we’d all be too risk averse to speak out on it. Obviously, I’m not a woman, but it makes a lot of sense. Pick on the people that won’t fight back. Pick on the people that will freeze their rates for 20 or 30 years. Keep telling a lie and it will become the truth because people expect it to become the truth. Look how many reporters believe audio recording is cheaper even when that’s not necessarily true.
Here’s my assumption: a little bit of hope and we’ve won. Decades ago, a scientist named Richter did an experiment where rats were placed in the water. It took them a few minutes to drown. Another group of rats were taken out of the water just before they drowned. The next time they were submerged, they swam for hours to survive. We’re not rats, we’re reporters, but I’ve watched this work for humans too. Years ago, doctors estimated a family member would live about six more months. We all rallied around her and said “maybe they’re wrong.” She went another three years. We have a totally different situation here. We know they’re wrong. Every reporter has a choice: sit on the sideline and let other people decide what happens or become advocates for the consumers we’ve been protecting for the last 140 years, before the stenotype design we use today was even invented. People have been telling stenographers that their technology is outdated since before I was born, and it’s only gotten more advanced since that time. Next time somebody makes such a claim, it’s not unreasonable for you to question it, learn what you can, and let your clients know what kind of deal they’re getting with the “new tech.”
Some readers checked in with the Eclipse AI Boost, and as it was relayed to me, the agreement is that Google will not save the audio and will not be taking the stenographic transcriptions. Assuming that this is true, my current understanding of the tech is that stenographers would not be helping improve the technology by utilizing this technology unless there’s some clever wordplay going on, “we’re not saving the audio, we’re just analyzing it.” At this point, I have no reason to suspect that kind of a game. In my view, our software manufacturers tend to be honest because there’s simply no truth worth getting caught in a lie over. The worst I have seen are companies using buzzwords to try to appease everyone, and I have not seen that from Advantage.
Admittedly, I did not reach out to Advantage myself because this was meant to assist reporters with understanding the concepts as opposed to a news story. But I’m very happy people took that to heart and started asking questions.
As a stenographic court reporter, I have been amazed by the strides in technology. Around 2016, I, like many of you, saw the first claims that speech recognition was as good as human ears. Automation seemed inevitable, and a few of my most beloved colleagues believed there was not a future for our amazing students. In 2019, the Testifying While Black study was published in the Language Journal, and while the study and its pilot studies showed that court reporters were twice as good at understanding the AAVE dialect as your average person, even though we have no training whatsoever in that dialect, the news media focused on the fact that we certify at 95 percent and yet only had 80 percent accuracy in the study. Some of the people involved with that study, namely Taylor Jones and Christopher Hall, introduced Culture Point, just one provider that could help make that 80 percent so much higher. In 2020, a study from Stanford showed that automatic speech recognition had a word error rate of 19 percent for “white” speakers, 35 percent for “black” speakers, and “worse” for speakers with a high dialect density. How much worse?
75 percent word error rate in a study done three or four years after the first claim that automatic speech recognition had 94 percent accuracy. But in all my research and all that has been written on this topic, I have not seen the following point addressed:
What Is An Error?
NCRA, many years ago, set out guidelines for what constituted an error. Word error guidelines take up about a page. Grammatical error guidelines take up about a page. What this means is that when you sit down for a steno test, you’re not being graded on your word error rate (WER), you’re being graded on your total errors. We have decades of failed certification tests where a period or comma meant a reporter wasn’t ready for the working world yet. Even where speech recognition is amazing on that WER, I’ve almost never seen appreciable grammar, punctuation, Q&A, or anything that we do to make the transcript readable. It’s so bad that advocates for the deaf, like Meryl Evans, refer to automatic speech recognition as “autocraptions.”
Unless the bench, bar, and captioning consumers want word soup to be the standard, the difference in how we describe errors needs to be injected into the discussion. Unless we want to go from a world where one reporter, perhaps paired with a scopist, completes the transcript and is accountable for it, to a world where up to eight transcribers are needed to transcribe a daily, we need to continue to push this as a consumer protection issue. Even where regulations are lacking, this is a serious and systemic issue that could shred access to justice. We have to hit every medium possible and let people know the record — in fact, every record in this country — could be in danger. The data coming out is clear. Anyone selling recording and/or automatic transcription says 90-something percent accuracy. Any time it’s actually studied? Maybe 80 percent accuracy, maybe 25; maybe they hire a real expert transcriber, or maybe they outsource all their transcription to Kenya or Manila. Perception matters; court administrators are making industry-changing decisions based on the lies or ignorance of private sector vendors.
The point is recording equipment sellers are taking a field which has been refined by stenographic court reporters to be a fairly painless process where there are clear guidelines for what happens when something goes wrong, adding lots of extra parts to it, and calling it new. We’ve been comparing our 95 percent total accuracy to their “94 percent” word error rate. In 2016, perhaps there were questions that needed answering. This is April 2021, there’s no contest, and proponents of digital recording and automatic transcription have a moral obligation to look at the facts as they are today and not what they’d like them to be.
During our Court Reporting & Captioning Week 2021 there were a couple of press releases and some press releases dressed up as journalism all about digital recording, automatic speech recognition, and its accuracy and viability. There’s actually a lesson to be learned from businesses that continually promise without any regard for reality, so that’s what I’ll focus on today. I’ll start with this statement. We have a big, vibrant field of students and professionals where everyone that is actually involved in it, from the smallest one-woman reporting armies to the corporate giants, says technology will not replace the stenographic court reporter. Then we have the tech players who continuously talk about how their tech is 99 percent accurate, but can’t be bothered to sell it to us, and whose brilliant plan is to record and transcribe the testimony, something stenographers figured out how to do decades ago.
You know the formula. First we’ll compare this to an exaggerated event outside the industry, and then we’ll tie it right into our world. So let’s breeze briefly over Fyre Festival. To put it in very simple terms, Fyre Festival was an event where the CEO overpromised, underdelivered, and played “hide the ball” until the bitter end. Customers were lied to. Investors were lied to. Staff and construction members were lied to. It was a corporate fiasco propped up by disinformation, investor money, and cash flow games that ended with the CEO in prison and a whole lot of people owed a whole lot of money that they will, in all likelihood, never get paid. It was the story of a relative newcomer to the industry of music festivals saying they’d do it bigger and better. Sound familiar?
As for relative newcomers in the legal transcription or court reporting business, take your pick. Even ones that have been incorporated for a couple of decades really aren’t that impressive when you start holding up the magnifying glass. Take, for example, VIQ Solutions and its many subsidiaries:
VIQ apparently trades OTC so it gives us a rare glimpse of financial information that we don’t get with a lot of private companies. Right off the bat, we can see some interesting stuff. $8 million in revenue with a negative net income and a positive cash flow. Positive cash flow means the money they have on hand is going up. Negative income means the company is losing money. How does a company lose money but continue to have cash on hand grow? Creditors and investors. When you see money coming in while the company is taking losses, it generally means that the company is borrowing the money or getting more cash from investors/shareholders. A company can continue on this way for as long as money keeps coming in. Companies can also use tricks similar to price dumping, and charge one client or project an excessive amount in order to fund losses on other projects. The amazing thing is that most companies won’t light up the same way Fyre did, they’ll just declare bankruptcy and move on. There’s not going to be a big “gotcha” parade or reckoning where anyone admits that stenographic court reporting is by far the superior business model.
This is juxtaposed against a situation where, for the individual stenographic reporter, you’re kind of stuck making whatever you make. If things go badly, bankruptcy is an option, but there’s never really an option to borrow money or receive investor money for decades while you figure it out. Seeing all these ostensible giants enter the field can be a bit intimidating or confusing. But any time you see these staggering tech reveals wrapped up in a paid-for press release, I urge you to remember Fyre, remember VIQ, and remember that no matter what that revenue or cash flow looks like, you may not have access to the information that would tell you how the company is really doing.
This also leads to a very bright future for steno entrepreneurs. As we learn the game, we can pass it along to each other. When Stenovate landed its first big investor, I talked about that. Court reporting and its attached services, in the way we know them and love them, are an extremely stable, winning investment. Think about it. Many of us, when we begin down this road, spend up to $2,000 on a student machine and up to $9,000 on a professional machine and software. That $11,000 sinkhole, coupled with student loan debt, grows into stable, positive income. So what’s stopping any stenographic court reporting firm from getting out there and educating investors on our field? The time and drive to do it. Maybe for some people, they just haven’t had that idea yet. But that’s where we’re headed. I have little doubt that if we compete, we will win. But we have to get people in that mindset. So if you know somebody with that entrepreneurial spirit, maybe pass them this post and get them thinking about whether they’d like to seek investors to grow their firm and reach. Business 101 is that a dollar today is more valuable than a dollar tomorrow. That means our field can be extremely attractive to value investors and be a safe haven from the gambling money being supplied to “tech’s” habitual promisors.
Know a great reporting or captioning firm that needs a spotlight? Feel free to write me or comment about them below. I’ll start us off. Steno Captions, LLC launched off recently without doing the investor dance. That’s the kind of promise this field has. I wish them a lot of luck and success in managing clients and training writers.
We’re in an interesting time. Pretty much anywhere you look there are job postings for digital reporters, articles with headlines talking about our replacement, articles with headlines talking about our angst. Over time, brilliant articles from people like Eric Allen, Ana Fatima Costa, Angie Starbuck (bar version), and Stanley Sakai start to get buried or appear dated when, in actuality, not much has changed at all. They’re super relevant and on point. Unfortunately, at least for the time being, we’re going to have to use our professional sense, think critically, and keep spreading the truth about ourselves and the tech we use.
One way to do that critical thinking is to look squarely at what is presented and notice what goes unmentioned. For example, look back at my first link. Searching for digital reporting work, ambiguous “freelance” postings come up, meaning stenographer jobs are actually branded as “digital” jobs. District courts seeking a stenographer? Labeled as a digital job. News reporters to report news about court? Labeled as a digital job. No wonder there’s a shortage, we’re just labeling everything the same way and expecting people who haven’t spent four decades in this business to figure it out. In this particular instance, Zip Recruiter proudly told me there were about 20 digital court reporter jobs in New York, but in actuality about 90 percent were mislabeled.
Another way to do it is to look at contradictions in a general narrative. For example, we say steno is integrity. So there was an article from Lisa Dees that shot back and said, basically, any method can have integrity. Can’t argue there. Integrity is kind of an individual thing. But to get to the conclusion these things are equal, you have to ignore a lot of stuff that anyone who’s been working in the field a while knows. Stenography has a longer history and a stronger culture. With AAERT pulling in maybe 20 percent of what NCRA does on the regular, who has more money going into ethics education? Most likely stenographers. When you multiply the number of people that have to work on a transcript, you’re multiplying the risk of one of those people not having integrity. We’re also ignoring how digital proponents like US Legal have no problem going into a courtroom and arguing that they shouldn’t be regulated like court reporters because they don’t supply court reporting services. Even further down the road of integrity, we know from other digital proponents that stenography is the gold standard (thanks, Stenograph) and that the master plan for digital proponents is to use a workforce that is not highly trained. I will totally concede that these things are all from “different” sources, but they all point to each other as de facto experts in the field and sit on each other’s boards and panels. It’s very clear there’s mutual interest. So, again, look at the contradictions. “The integrity of every method is equal, but stenography is the gold standard, but we are going to use a workforce with less training.” What?
Let’s get to how to talk about this stuff, and for that, I’m going to leave an example here. I do follow the court reporting stuff that gets published by Legaltech News. There’s one news reporter, Victoria Hudgins, who has touched on steno and court reporting a few times. I feel her information is coming mostly from the digital proponents, so in an effort to provide more information, I wrote:
“Hi Ms. Hudgins. My name’s Christopher Day. I’m a stenographer in New York. I follow with great interest and admiration most of your articles related to court reporting in Legal Tech News [sic]. But I am writing today to let you know that many of the things being represented to you by these companies appear false or misleading. In the August 24 article about Stenograph’s logo, the Stenograph offices that you were given are, as best I can tell, a stock photo. In the September 11 article about court reporter angst, Livne, says our field has not been digitized, but that’s simply not true. Court reporter equipment has been digital for decades. The stenotype picture you got from Mr. Rando is quite an old model and most of us do not use those anymore. I’m happy to send you a picture of a newer model, or share evidence for any of my statements in this communication.
Our position is being misrepresented very much. We are not worried so much about the technology, we are more worried that people will believe the technology is ready for prime time and replace us with it without realizing that it is not. Livne kind of admitted this himself. In his series A funding, he or Verbit stated that the tech was 99 percent accurate. In the series B funding he said Verbit would not get rid of the human element. These two statements don’t seem very compatible.
How come when these companies are selling their ASR, it’s “99 percent” or “ready to disrupt the market,” but when Stanford studied ASR it was, at best, 80 percent accurate?
Ultimately, if the ASR isn’t up to the task, these are transcription companies. They know that if they continue to use the buzzwords, you’ll continue to publish them, and that will draw them more investors.
I am happy to be a resource on stenographic court reporting technology, its efficiency, and at least a few of the things that have been done to address the shortage. Please feel free to reach out.”
To be very fair, because of the limitations of the website submission form, she didn’t get any of the links. But, you know, I think this stands as a decent example of how to address news people when they pick up stories about us. They just don’t know. They only know what they’re told or how things look. There will be some responsibility on our part to share our years of experience and knowledge if we want fair representation in media. It’s the Pygmalion effect at work. Expectations can impact reality. That’s why these narratives exist, and that is why a countering narrative is so important. Think about it. When digital first came it was all about how it was allegedly cheaper. When that turned out not to be true, it became a call for stenographers to just see the writing on the wall and acknowledge there is a shortage and that there is nothing we can do about it. Now that’s turning out not to be true, we’re doing a lot about it, and all we have left is to let those outside the industry know the truth.
A reader reminded me that Eric Allen’s article is now in archive. The text may be found here. For context purposes, it came amid a series of articles by Steve Townsend, and is an excellent example of what I’m talking about in terms of getting the truth out there.