A series of 2019 predictions by Gartner were reported on by Venture Beat on June 28, 2021. As explained in a priorpost, “AI”, or machine learning, relies on datasets and algorithms. If the data is imperfect or incomplete, a computer has a chance of giving bad output. If the algorithm that tells the computer what to do with the data is imperfect, the computer has a chance of giving bad output. It’s easy to point to anecdotal cases where “AI” makes a bad call. There have been reports of discrimination in facial recognition technology, driverless cars killing people, or Amazon’s algorithm deciding to fire drivers that are doing their job. I’ve seen plenty of data on the failings of overhyped technology and commercial ASR. What I hadn’t seen prior to today was somebody willing to put a number on the percentage of AI solutions that succeed. Today, we have that number, and it’s an abysmal 15%.
Perhaps this will not come as a surprise to my readers, considering prior reports that automatic speech recognition (ASR), an example of machine learning, is only 25 to 80 percent accurate depending on who’s speaking. But it will certainly come as a surprise to investors and companies that are dumping money into these technologies. Now there’s a hard number to consider. And that 15% itself is misleading. It’s a snapshot of the total number of implementations, not just ASR. ASR comprises a percentage of the total number of implementations out there. And it’s so bad that some blogs are starting to claim word error rate isn’t really that important.
That 15% is also misleading in that it’s talking about solutions that are implemented successfully. It is not talking about implementations that provide a positive return on investment (ROI). So imagine having to go to investors and say “our AI product was implemented with 100% success, but there’s still no money in this.”
The Venture Beat article goes on to describe several ways to make AI implementation a success, and I think it’s worth examining them briefly here.
Customizing a solution for each environment. No doubt that modeling a solution for every single business individually is bound to make that solution more successful, but it’s also going to take more staff and money. This would be almost like every court reporting company having their own personal software development staff to build their own CaseCAT or Eclipse. Why don’t they do that? It’s hopelessly expensive.
Using a robust and scalable platform. The word robust doesn’t really mean anything in this context. Scalability is tied to modular design — the ability to swap out parts of the program that don’t work for specific situations. For this, you need somebody bright and forward thinking. They have to have the capability to design something that can be modified to handle situations they may not even be aware exist. With the average software engineer commanding in the ballpark of $90,000 a year and the best of them making over $1 million a year, it’s hopelessly expensive.
Staying on course once in production. This involves reevaluating and sticking with something that may appear to be dysfunctional. This would be almost like the court reporter coming to the job, botching the transcript, and the client going “yes, I think I’ll use that guy again so that I can get a fuller picture of my operational needs.” It’s a customer service nightmare.
Adding new AI use cases over time. Piggybacking on number 3, who is going to want to continue to use AI solutions to patch what the first solution fails to address? This is basically asking businesspeople to trust that it will all work out while they burn money and spend lots of time putting out the fire. It’s a customer service nightmare.
I really respect Venture Beat trying to keep positive about AI in business, even if it’s a hopelessly expensive customer service nightmare.
With some mirth, I have to point out to those in the field that believe the stenographer shortage is an insurmountable problem that we now know machine learning in the business world has a failure rate that’s right up there with stenographic education’s failure rate. Beyond the potential of exploiting digital reporters or stealing investor money, what makes this path preferable to the one that has worked for the last hundred years? As I wrote a week ago, the competition is going to wise up. Stenographic court reporters are the sustainable business model in this field, and to continue to pretend otherwise is nothing short of fraud.
Journalists, we need to talk about court reporting.
Court reporting? What’s that? Court reporting traditionally refers to stenographic reporting, where somebody is taking down verbatim notes on a stenotype. We do this in legal proceedings as well as broadcast captioning, and believe it or not, our keyboard design, invented in 1911, is the best technology in voice recognition and taking down the spoken word today. Sounds incredible, right? But look at the airplane. It started out in 1903 looking something like this:
We all know what happened. The design got better and today we have airliners that can fly hundreds of people at once. Same with the camera of the 1800s that became the compact and ubiquitous technology we have today in so many devices. Very much the same happened with our stenotype. In fact, I have a handy guide here. Feel free to use that middle image in any article you want.
We started off with old timey machines where you tap the key and it punches the ink into the paper. We evolved into an era of Dictaphones and floppy disks where we’d narrate our notes to be typed up by somebody else. These days we’re packing laptops attached to minicomputers. We’re always asked “why don’t you just record it?” Truth is we’ve had that capability for a really long time. We go beyond that and have the ability to transcribe what we’ve “recorded” in record times.
We have a real perception problem in our field. There’s this ongoing push from tech sellers to come in and say our technology is old and that automation is on the way. The problem? Tech journalists, publications, or analysts often eat it up and publish it right away. I always point to this October 2020 article as a great example. It literally depicts an old-fashioned stenographer phasing out into computer code under the headline “Will court reporting undergo a pandemic shift?” It goes on to publish some quotes from Verbit and Veritext that point to things changing/evolving/shifting. The messaging is really clear. “We have the technology. Why do we need court reporters?” They term court reporter criticism as “resistance.”
A lie is being sold. This isn’t something that takes heavy investigation to figure out. When asked about the field in that article, Veritext’s CTO was quoted as saying “there will be no choice but to move forward with well-proven audio and transcription technologies and services to meet the need, and we expect to see rapid adoption there.” Meanwhile, when asked for a quote for Stenonymous, Veritext said with regard to technology “…it will not take the place of the stenographer…” They’re not alone. Tom Livne of Verbit has been quoted saying our world is not yet digitized when I just showed you that it is digitized and it has been for decades. In series A funding for Verbit, claims of 99 percent accuracy were thrown around. In series B funding, it was said that technology would not replace the human. All these automation proponents are pretty quick to dismiss automation. Could it be that automation of speech transcription is just not that simple?
It would be fair enough if it was just my word against theirs. But there are outside observers that have looked at the situation and concluded all the same things. In a May 2020 article from Scientific American, journalist Wade Roush noted that speech recognition simply was not perfect and might never be. He pointed to Microsoft’s claim in 2016 of 96 percent accuracy, better than human transcribers, and noted there have been few improvements since. In the Stanford study “Racial disparities in automated speech recognition” it was noted that automatic speech recognition was 80 percent accurate on white speakers, 65 percent accurate on black speakers, and “worse” on the AAVE dialect. “Worse” meant 25 to 50 percent accurate. So here we are taking stenographic reporter accuracy, certified with 5 percent total errors or fewer and comparing it to a word error rate between 20 percent and 75 percent.
But do we really need accurate court transcripts and good captions for captioning consumers? Nobody cares who this hurts as long as it makes investors happy, right? Sadly, there’s not much evidence to show that it’ll even do that. Much of the financial data for court reporting is hidden through private companies or paywall research data. When I examined VIQ Solutions, a company that recently acquired Net Transcripts and is ostensibly part of the “record and transcribe” crowd, I pointed out there’s plenty of revenue there, but net losses. In news regarding Verbit’s acquisition of VITAC, it was stated that revenue was in the millions and cash flow was positive, which means it’s likely profits are low or non-existent. At the risk of sounding like a cynic, I think it’s very clear, if there was profit and a decent rate of return they would be unreserved in telling us that. Like other AI ventures, there’s probably just a slow burn of money. So every time a writer jumps aboard the “technology” train without consulting anybody that actually works in the field or doing a little research, it’s burying the truth a little deeper under this false perception and really hurting a vibrant, viable field that really needs people. We’re not so different. The tech sellers are coming for your job too, and it’s just as hilarious and embarrassing.
The other issue we have is that we know you can figure this stuff out. When our job is up for grabs, there’s a kind of jubilant repetition of the word “disruption.” Meanwhile, when it’s a job that has some sense of importance or power, like a judge, journalists begin explaining things. Take this article on Chinese holographic AI judges, where the author makes sure to point out there are differences in American and Chinese law that may make this more plausible, as well as explaining that the “AI” is only as good as its dataset. This is a big problem, because the companies invested in “AI” have zero accountability. If someone brings up issues with a technology’s output being racist or sexist, they are summarily fired and their opinion swept under the rug. In my field, at the very least, every member of the public is entitled to make a complaint about a court reporter that violates our ethical codes. That’s on top of any legal remedies that may be available or justified in the event a court reporter acts irresponsibly! If you can’t get it right when you report on it, these companies are not going to correct you when you’re wrong in their favor.
All we are asking for is some fairness in the way our field is reported on in the news. I’ve often joked that advocating for stenographic court reporting is a lot like the children’s story Horton Hears A Who. We’re here but we’re unseen and unheard. We’re in danger of being boiled by big money and tall tales. Those of us that speak up can face a lot of ridicule or be cut short. Take my appearance on VICE News about the Testifying While Black study. Here’s an important topic that deserves headlines, namely, addressing disparate outcomes based on the dialect that someone speaks. I was filmed for about two hours, and Taylor Jones and his people were, as I understand it, filmed close to nine hours. Nobody expected a ten-hour special, but this topic got fifteen minutes. Court reporters took some serious heat in the news because we scored 80 percent accuracy in African American Vernacular English dialect. Every single news source I’ve seen has missed or excluded pilot study 1, where regular people scored 40 percent, and pilot study 2, where lawyers scored 60 percent. VICE cut me talking about the pilot studies and how people who really care about record accuracy need to join our field. You have a story here where court reporters are twice as good as the average person at hearing and transcribing this AAVE dialect that we get no training in, and that got warped into many variations of “court reporters don’t understand black people.” That’s a concept mired in ignorance. The story itself acknowledges not all black people speak AAVE, and yet the headline and lede rips on us despite the fact that we’re the most likely ones in the room to understand AAVE. I cannot imagine such an irresponsible word game. It’s almost like publishing an article with the headline “Journalists May Be Reporting Black People’s Stories Wrong” just because they might ostensibly fit into the category of “regular people.” But I can’t imagine that anyone would ever lump groups of people together and make broad, false headlines just for clicks. Oh, wait —
Even in a pretty amazing article about social justice where I got to offer some input, the accuracy of us versus others ended up not making the cut. I like the author a lot, but it’s pretty clear that somewhere along the way a decision was made to exclude the possibility that we’re hearing people better than anyone else in the room. Not much different from when Max Curry was quoted as saying digital reporting was too risky, but there was hardly any explanation as to why, despite a field of nearly 30,000 people and data that suggests recording proceedings achieves no real cost reduction and no efficiency gains. See what I’m saying? Sometimes it’s worse than simply not publishing anything from us. Sometimes it’s cherry picking what we say to make it look like both sides are represented when they simply aren’t or that a topic was explained when it simply wasn’t.
I know that this perceived unfairness is a result of many factors and that some are outside your control. The drive to get people to read strongly encourages clickbait journalism. Editors and outlets can decide to cut journalists’ work if it doesn’t adhere to a particular narrative or standard. The fact that court reporting and machine shorthand stenography is a fairly niche skill adds to the dilemma. Industry outsiders are not going to know there are national, state, and nonprofit databases to find court reporters for interviews and demonstrations. There are a myriad of issues that coalesce to create the situation I’m describing. But we really need some attention on these issues. We create the legal record for so many lawsuits and criminal prosecutions. We broadcast (good) captions so that the deaf can have access. The inadequacy of alternatives cannot be understated. But the average reporter age is 55 now, and to continue our good work we’ll need the media to be unafraid of publishing the truth. Help us attract people who will carry on this service for generations. We need the media to stop republishing the shortage forecast from 8 years ago and point people towards all the resources that we have built since then to help people find this career, such as Open Steno, Project Steno, and NCRA A to Z.
There is a small, loud contingent in the private sector that describes our stenographer shortage as mathematically impossible to solve. Years ago, the Court Reporting Industry Outlook by Ducker Worldwide, in a nutshell, forecasted the demand of stenographic reporters eclipsing the supply of stenographic reporters. At that point in the 2013-2014 report it was forecasted that about 70 percent of existing reporters would retire over the next 20 years. It was forecasted that in 2018 there would be a gap of about 5,500 court reporters due to increased demand and retirements. In a breakdown by state, it was clear that California, Texas, Illinois, and New York would have it the hardest, but the prediction was a gap of at least 100 reporters in several states by 2018.
This is but one of few bold arguments put out by digital recording proponents as to why the modality of taking the record must change away from stenographic reporting. As reporters and committees like NCRA Strong started to push back against the myth that digital was better or cheaper, and developed resources to help others explain the truth, the stenographer shortage became the last bastion of hope for recording equipment to take reporter seats.
It’s a simple message that’s easy to digest: “It takes too long to train stenographers and the failure rate is too high, therefore we must change.” This argument is even embraced by CSRs working for larger agencies that have actively promoted digital reporting as the way forward, such as Veritext or US Legal. I take umbrage with this simple message because it’s a lie. This idea that there is nothing we can do is a lie by omission, and it ignores any and all progress we’ve made in recruitment. Since the Ducker Report, Open Steno has expanded exponentially in introducing stenography and free resources to learn it to people all over the world. Its Discord channel continues to grow and has hundreds of users online each day.
Also since the Ducker Report, NCRA A to Z was born. Project Steno began heavy recruitment activity. Independent actors such as Allison Hall have worked in their own communities to get programs started and flourishing. Again, all things generally ignored by the we-must-record crowd. It’s only business, right? If they can’t fill the seats, it’s not their fault! But it’s painfully obvious that digital recording proponents are not attempting to build interest in stenographic reporting. We are a community, and some members of our community are obsessed with spouting the shameful idea that there’s just nothing that can be done while watching everyone else do.
But even those of us who know all about the shortage and have worked in some capacity to fix it have overlooked some important industry comparisons. In the tech world, there’s a forecasted need of some 1.4 million workers and an expected graduation of 400,000 workers. If our 5,000-person shortage is mathematically impossible to solve then tech must be absolutely doomed, right? It takes a whole four years to get a computer science degree! Time to replace all the programmers with robots, right? Nope. Instead, the argument is made to look at the number of self-taught people or people that do not have a traditional degree. The argument is made that programmers should be paid more to entice workers. Even in fields of “unskilled workers”, when there is a shortage, they don’t sit around and whine about there being nothing they can do, they jack up the prices to reflect demand.
Compare this to our field, where freelance reporters in New York are currently working for less than 1991 rates adjusted for inflation and companies still aren’t happy. At a certain point, there’s simply no more we can give. We’d each do better taking our own customers and binding our own transcripts than continue to forfeit large percentages of our money just so we don’t have to handle clients. To illustrate this better, the following is a chart for the average US worker hourly pay adjusted for inflation.
If we were to have an identical chart for reporting in New York, for reporters making under $5.50 a page on their original, the number would be decreasing. We’re not just behind the average US hourly worker, we are steadily losing ground and the gap is widening. It’s not really surprising we’re having trouble filling seats. It’s good money for what we do, but the great money in the private sector has been quietly locked behind roughs and realtime, forcing reporters to work harder and write more to have the same buying power.
The above notes on pay come with a caveat. I’m not a stupid man. I know the money in this field comes from the copy sales. I know that’s very unlikely to change in the near future. But for an honest comparison, I’ve examined the original prices, and if the original prices are that deflated, reporters have to ask themselves if copy rates have budged when adjusted for inflation, and there’s no evidence to suggest they have.
So when we are discussing shortage, I hope there are four points everyone will remember and educate fellow reporters on when they buy the line that there’s nothing we can do.
1. The number of self-taught reporters is not counted, making our shortage forecast larger than it is.
2. There are many more programs and resources for people who want to learn about stenography today than there were when the stenographer shortage was forecasted. Some examples include NCRA A to Z, Open Steno, and Project Steno.
3. Companies that genuinely care about the shortage can directly impact it by promoting steno, relaxing deadlines, or increasing reporter pay, which is in line with other industries.
4. With an estimated 30,000 stenographers, if we each spent an hour a year on recruitment activity, it would be the equivalent of 82 hours of recruitment a day, far more time than any company is spending promoting or recruiting for other modalities.
With the news that Verbit has bought VITAC, there was some concern on steno social media. For a quick history on Verbit, it’s a company that claimed 99 percent accuracy in its series A funding. In its series B funding it was admitted that their technology would not replace the human. Succinctly, Verbit is a transcription company where its transcribers are assisted by machine learning voice recognition. Of course, this all has the side effect of demoralizing stenographers who sometimes think “wow, the technology really can do my job” because nobody has the time to be a walking encyclopedia.
But this idea that Verbit, a company started in 2016, figured out some super secret knowledge is not realistic. To put voice recognition into perspective, it’s estimated to be a market worth many billions of dollars. Microsoft is seeking to buy Nuance, the maker of Dragon, for about $20 billion. Microsoft has reportedly posted revenue over $40 billion and profit of over $15 billion. Verbit, by comparison, has raised “over $100 million” in investor money. It reports revenue in the millions and positive cash flow. Another company that reports revenue in the millions and positive cash flow? VIQ Solutions, parent of Net Transcripts. As described in a previous post, VIQ Solutions has reported millions in revenue and a positive cash flow since 2016. What’s missing? The income. Since 2016, the company hasn’t been profitable.
Obviously, things can turn around, companies can go long periods of time without making a profit, bounce back, and be profitable. Companies can also go bankrupt and dissolve a la Circuit City or be restructured like JCPenney. The point is not to disparage companies on their financials, but to give stenographic captioners real perspective on the information they’re reading. So, when you see this blurb here, what comes to mind?
Hint. What’s not being mentioned? Profit. While this is not conclusive, the lack of any mention of profit tells me the cash flow and revenue is fine, but there are no big profits as of yet. Cash flow can come from many things, including investors, asset sales, and borrowing money. Most of us probably make in the ballpark of $50,000 to $100,000. Reading that a company raised $60 million, ostensibly to cut in on your job, can be pretty disheartening. Not so once you see that they’re a tiny fraction of the overall picture and that players far bigger than them have not taken your job despite working on the technology for decades.
Moreover, we have a consumer protection crisis on our hands. At least one study in 2020 showed that automatic speech recognition can be 25 to 80 percent accurate depending on who’s speaking. There are many caption advocates out there, such as Meryl Evans, trying to raise awareness on the importance of caption quality. The messaging is very clear: automatic captions are crap (autocraptions), they are often worse than having no captions, and a single wrong word can cause great confusion for someone relying on the captions. Just go see what people on Twitter are saying about #autocraptions. “#NoMoreCraptions. Thank you content creators that do not rely on them!”
This isn’t something I’m making up. Anybody in any kind of captioning or transcription business agrees a human is required. Just check out Cielo24’s captioning guide and accuracy table.
If someone’s talking about an accuracy level of 95 percent or better, they’re talking about human-verified captions. If you, captioner, were not worried about Rev taking away your job with its alleged 50,000 transcribers, then you should not throw in the towel because of Verbit and its alleged 30,000 transcribers. We do not know how much of that is overlap. We do not know how much of that is “this transcriber transcribed for us once and is therefore part of our ‘team.'” We do not know how well transcription skills will fit into the fix-garbage-AI-transcription model. The low pay and mistreatment that comes with “working for” these types of companies is going to drive people away. Think of all the experiences you’ve had to get you to your skill level today. Would you have gotten there with lower compensation, or would you have simply moved on to something easier?
Verbit’s doing exceptionally well in its presentation. It makes claims that would cost quite a bit of time and/or money to disprove, and the results of any such investigation would be questioned by whoever it did not favor. It’s a very old game of making claims faster than they can be disproven and watching the fact checkers give you more press as they attempt to parse what’s true, partially true, and totally false. This doesn’t happen just in the captioning arena, it happens in legal reporting too.
This seems like a terrifying list of capabilities. But, again, this is an old game. Watch how easy it is.
It took me 15 seconds to say six lies, one partial truth, and one actual truth. Many of you have known me for years. What was what? How long will it take you to figure out what was what? How long would it take you to prove to another person what’s true and what’s false? This is, in part, why it is easier for falsehoods to spread than the truth. This is why in court and in science, the person making a claim has to prove their claim. We have no such luxury in the business world. As an example, many years ago in the gaming industry Peter Molyneux got up on stage and demo’d Milo. He said it was real tech. Here was this dynamically interactive virtual boy who’d be able to understand gamers and their actions. We watched it with our own eyes. It was so cool. It was BS. It was very likely scripted. There was no such technology and there is no such technology today, over eleven years later. Do you think Peter, Microsoft, or anybody got in trouble for that? Nope. In fact, years later, he claimed “it was real, honest.”
Here’s the point: Legal reporters and captioners are going to be facing off with these claims for an indeterminate amount of time. These folks are going to be marketing to your clients hard. And I just showed you via the gaming industry that there are zero consequences for lying and that anything that is lied about can just be brushed up with another lie. There will be, more or less, two choices for every single one of you.
Compete / Advocate. Start companies. Ally with deaf advocates.
Watch it happen.
I have basically dedicated Stenonymous to providing facts, figures, and ways that stenographers can come out of the “sky is falling” mindset. But I’m one guy. I’m an official in New York. Science says there’s a good chance what we expect to happen will happen and that’s why I fight like hell to get all of you to expect us to win. That’s also why these companies repeat year after year that they’re going to automate away the jobs even when there’s zero merit or demand for an idea. You now see that companies can operate without making any profit, companies can lie, much bigger companies haven’t muscled in on your job, and that the giant Microsoft presumably looked at Verbit, looked at Nuance, and chose Nuance.
I’m not a neo-luddite. If the technology is that good, let it be that good. Let my job vanish. Fire me tomorrow. But facts are facts, and the fact is that tech sellers take the excellent work of brilliant programmers and say the tech is ready for prime time way before it is. They never bother to mention the drawbacks. Self-driving cars and trucks are on the way, don’t worry about whether it kills someone. Robots can do all these wonderful things, forget that injuries are up where they’re in heaviest use. Solar Roadways were going to solve the world’s energy problems but couldn’t generate any energy or be driven on. In our field, lives and important stakeholders are in danger. What happens when there’s a hurricane on the way and the AI captioning tells deaf people to drive towards danger?
Again, two choices, and I’m hoping stenographic captioners don’t watch it happen.
There’s a lot of conjecture when it comes to automatic speech recognition (ASR) and its ability to replace the stenographic reporter or captioner. You may also see ASR referred to as NLP or natural language processing. An important piece of the puzzle is understanding the basics behind artificial intelligence and how complex problems are solved. This can be confusing for reporters because in any of the literature on the topic, there are words and concepts that we simply have a weak grasp on. I’m going to tackle some of that today. In brief, computer programmers are problem solvers. They utilize datasets and algorithms to solve problems.
What is an algorithm?
An algorithm is a set of instructions that tell a computer what to do. You can also think of it as computer code for this discussion. To keep things simple, computers must have things broken down logically for them. Think of it like a recipe. For example, let’s look at a very simple algorithm written in the Python 3 language:
Line one tells the computer to put the words “The stenographer is _.” on the screen. Line two creates something called a Stenographer, and the Stenographer is equal to whatever you type in. If you input the word awesome with a lowercase or uppercase “a” the computer will tell you that you are right. If you input anything else, it will tell you the correct answer was awesome. Again, think of an algorithm like a recipe. The computer is told what to do with the information or ingredients it is given.
What is a dataset?
A dataset is a collection of information. In the context of machine learning, it is a collection that is put into the computer. An algorithm then tells the computer what to do with that information. Datasets will look very different dependent on the problem that a computer programmer is trying to solve. As an example, for enhancing facial recognition, datasets may be comprised of pictures. A dataset may be a wide range of photos labeled “face” or “not face.” The algorithm might tell the computer to compare millions of pictures. After doing that, the computer has a much better idea of what faces “look like.”
What is machine learning?
As demonstrated above, algorithms can be very simple steps that a computer goes through. Algorithms can also be incredibly complex math equations that help a computer analyze datasets and decide what to do with similar data in the future. One issue that comes up with any complex problem is that no dataset is perfect. For example, with regard to facial recognition, there have been situations with almost 100 percent accuracy with lighter male faces and only 80 percent accuracy with darker female faces. There are two major ways this can happen. One, the algorithm may not accurately instruct the computer on how to handle the differences between a “lighter male” face and a “darker female” face. Two, the dataset may not equally represent all faces. If the dataset has more “lighter male” faces in this example, then the computer will get more practice identifying those faces, and will not be as good at identifying other faces, even if the algorithm is perfect.
Artificial intelligence / AI / voice recognition, for purposes of this discussion, are all synonymous with each other and with machine learning. The computer is not making decisions for itself, like you see in the movies, it is being fed lots of data and using that to make future decisions.
Why Voice Recognition Isn’t Perfect and May Never Be
Computers “hear” sound by taking the air pressure from a noise into a microphone and converting that to electronic signals or instructions so that it can be played back through a speaker. A dataset for audio recognition might look something like a clip of someone speaking paired with the words that are spoken. There are many factors that complicate this. Datasets might be focused on speakers that speak in a grammatically correct fashion. Datasets might focus on a specific demographic. Datasets might focus on a specific topic. Datasets might focus on audio that does not have background noises. Creating a dataset that accurately reflects every type of speaker in every environment, and an algorithm that tells the computer what to do with it, is very hard. “Training” the computer on imperfect datasets can result in a word error rate of up to 75 percent.
This technology is not new. There is a patent from 2000 that seems to be a design for audio and stenographic transcription to be fed to a “data center.” That patent was assigned to Nuance Communications, the owner of Dragon, in 2009. From the documents, as I interpret them, it was thought that 20 to 30 hours of training could result in 92 percent accuracy. One thing is clear: as far back as 2000, 92 percent accuracy was in the realm of possibility. As recently as April 2020, the data studied from Apple, IBM, Google, Amazon, and Microsoft was 65 to 80 percent accuracy. Assuming, from Microsoft’s intention to purchase Nuance for $20 billion, that Nuance is the best voice recognition on the market today, there’s still zero reason to believe that Nuance’s technology is comparable to court reporter accuracy. Nuance Communications was founded in 1992. Verbit was founded in 2016. If the new kid on the block seriously believes it has a chance of competing, and it seems to, that’s a pretty good indicator that Nuance’s lead is tenuous, if it exists at all. There’s a list of problems for automation of speech recognition, and even though computer programmers are brilliant people, there’s no guarantee any of them will be “perfectly solved.” Dragon trains to a person’s voice to get its high level of accuracy. It simply would not make economic sense to have hours of training a software to everyone who is going to speak in court forever until the end of time, and the process would be susceptible to sabotage or mistake if it was unmonitored and/or self-guided (AKA cheap).
This is all why legal reporting needs the human element. We are able to understand context and make decisions even when we have no prior experience with a situation. Think of all the times you’ve heard a qualified stenographer, videographer, or voice writer say “in 30 years, I’ve neverseen that.” For us, it’s just something that happens, and we handle whatever the situation is. For a computer that has never been trained with the right dataset, it’s catastrophic. It’s easy, now, to see why even AI proponents like Tom Livne have said that they will not remove the human element.
Why Learning About Machine Learning Is Important For Court Reporters
Machine learning, or applications fueled by machine learning, are very likely to become part of our stenographic software. If you don’t believe me, just read this snippet about Advantage Software’s Eclipse AI Boost.
If you’ve been following along, you’ve probably figured out, and it pretty much lays it out here, that datasets are needed to train “AI.” There are a few somewhat technical questions that stenographic reporters will probably want answered at some point:
Is this technology really sending your audio up to the Cloud and Google?
Is Google’s transcription reliable?
How securely is the information being sent?
Is the reporter’s transcription also being sent up to the Cloud and Google?
The reasons for answering?
The sensitive nature of some of our work may make it unsuitable for being uploaded. To the extent stuff may be confidential, privileged, or ex parte, court reporters and their clients may simply not want the audio to go anywhere.
Again, as shown in “Racial disparities in automated speech recognition” by Allison Koenecke, et al., Google’s ASR word error rate can be as high as 30 percent. Having to fix 30 percent of a job is a frightening possibility that could be more a hindrance than a help. I’m a pretty average reporter, and if I don’t do any defining on a job, I only have to fix 2 to 10 percent of any given job.
If we assume that everyone is fine with the audio being sent to the cloud, we must still question the security of the information. I assume that the best encryption possible would be in use, so this would be a minor issue.
The reporter’s transcription carries not only all the same confidential information discussed in point 1, but also would provide helpful data to make the AI better. Reporters will have to decide whether they want to help improve this technology for free. If the reporter’s transcription is not sent up with the audio, then the audio would only ostensibly be useful if human transcribers went through the audio, similar to what Facebook was caught doing two years ago. Do we want outside transcribers having access to this data?
Our technological competence changes how well we serve our clients. Nobody reading this needs to become a computer genius, but being generally aware of how these things work and some of the material out there can only benefit reporters. In one of my first posts about AI, I alluded to the fact that just because a problem is solvable does not mean it will be solved. I didn’t have any of the data I have today to assure me that my guess was correct. But I saw how tech news was demoralizing my fellow stenographers, and I called it as I saw it even though I risked looking like an idiot.
It’s my hope that reporters can similarly let go of fear and start to pick apart the truth about what’s being sold to them. Talk to each other about this stuff, pros and cons. My personal view, at this point, is that a lot of these salespeople saw a field with a large percentage of women sitting on a nice chunk of the “$30 billion” transcription industry, and assumed we’d all be too risk averse to speak out on it. Obviously, I’m not a woman, but it makes a lot of sense. Pick on the people that won’t fight back. Pick on the people that will freeze their rates for 20 or 30 years. Keep telling a lie and it will become the truth because people expect it to become the truth. Look how many reporters believe audio recording is cheaper even when that’s not necessarily true.
Here’s my assumption: a little bit of hope and we’ve won. Decades ago, a scientist named Richter did an experiment where rats were placed in the water. It took them a few minutes to drown. Another group of rats were taken out of the water just before they drowned. The next time they were submerged, they swam for hours to survive. We’re not rats, we’re reporters, but I’ve watched this work for humans too. Years ago, doctors estimated a family member would live about six more months. We all rallied around her and said “maybe they’re wrong.” She went another three years. We have a totally different situation here. We know they’re wrong. Every reporter has a choice: sit on the sideline and let other people decide what happens or become advocates for the consumers we’ve been protecting for the last 140 years, before the stenotype design we use today was even invented. People have been telling stenographers that their technology is outdated since before I was born, and it’s only gotten more advanced since that time. Next time somebody makes such a claim, it’s not unreasonable for you to question it, learn what you can, and let your clients know what kind of deal they’re getting with the “new tech.”
Some readers checked in with the Eclipse AI Boost, and as it was relayed to me, the agreement is that Google will not save the audio and will not be taking the stenographic transcriptions. Assuming that this is true, my current understanding of the tech is that stenographers would not be helping improve the technology by utilizing this technology unless there’s some clever wordplay going on, “we’re not saving the audio, we’re just analyzing it.” At this point, I have no reason to suspect that kind of a game. In my view, our software manufacturers tend to be honest because there’s simply no truth worth getting caught in a lie over. The worst I have seen are companies using buzzwords to try to appease everyone, and I have not seen that from Advantage.
Admittedly, I did not reach out to Advantage myself because this was meant to assist reporters with understanding the concepts as opposed to a news story. But I’m very happy people took that to heart and started asking questions.
We try to keep political stuff from being published here unless it’s educational, about court reporting, or about the industry. I’ve been pretty good about this. Commentators have been great about it. The occasional guest writer has been amazing with it. This topic touches with politics, but it’s not strictly political, so it should be fun to learn about.
It’s established that the United Kingdom, United States, China, Russia and several other countries view the internet as, more or less, another theater of war. They’ve had operatives and people hired to create fake posts, false comments, and advance the interests and ideas of the government. The prices reported? Eight dollars for a social media post, $100 for ten comments, and $65 for contacting a media source. In the case of China, they’re reportedly working for less than a dollar. If the host country allows it, you have trolls for hire.
So in the context of stenography and the court reporting industry, seems like whenever we get into the news, there are regular comments from regular people, such as “why not just record it?” Typical question. Anyone would ask this question. There are fun comments like “Christopher Day the stenographer looks like he belongs on an episode of Jeopardy.” Then there are comments that go above and beyond that. They make claims like — well, just take a look.
“…I gonna tell you that in modern technology we can record something like court testimony for hundreds of years back very easily…” “…the technology is smarter every single second…” “…if you store data in the digital format we can use an AI to extract the word from the voice in the data, it will be very accurate so much so the stenographer loses their jobs.” Wow! Lose our jobs? I felt that in my heart! Almost like it was designed to hurt a stenographer’s feelings. Right?
We can store the video for hundreds of years? Maybe. But consider that text files, no matter what way you swing it, are ten times smaller than audio files. They can be thousands of times smaller than video files. Take whatever your local court is paying for storage today and multiply that by 8,000. Unless we want a court system that is funded by advertisements a la Youtube, the taxpayer will be forced to cough up much more money than they are today. That’s just storing stuff.
The technology is getting smarter every second? No, not really. Whenever it’s analyzed by anybody who isn’t selling it, it’s actually pretty dumb and has been that way for a while. Take Wade Roush’s May 2020 article in the Scientific American (pg 24). “But accuracy is another matter. In 2016 a team at Microsoft Research announced that it had trained its machine-learning algorithms to transcribe speech from a standard corpus of recordings with record-high 94 percent accuracy. Professional human transcriptionists performed no better than the program in Microsoft’s tests, which led media outlets to celebrate the arrival of ‘parity’ between humans and software in speech recognition.”
“…And four years after that breakthrough, services such as Temi still claim no better than 95 percent — and then only for recordings of clear, unaccented speech.” Roush concludes, in part, “ASR systems may never reach 100 percent accuracy…” So technology isn’t getting smarter every second. It’s not even getting smarter every half decade at this point.
“…we can use an AI to extract the word from the voice in the data…” This technology exists, kind of, but perfecting it would be like perfecting speech recognition. Nobody’s watching 500 hours of video to see if it accurately returns every instance of a word. Ultimately, you’re paying for the computer’s best guess. Sometimes that’ll be pretty good. Sometimes you won’t find the droid you’re looking for.
Conclusion? This person’s probably not in the media transcoding industry, probably doesn’t know what they’re talking about, and is in all likelihood a troll. Were they paid to make that comment? We don’t know. But I think it’s time to realize that marketplaces are ripe for deception and propaganda. So when you see especially mean, hateful, targeted comments, understand that there’s some chance that the person writing the comment doesn’t live in the same country as you and doesn’t actually care about the topic they’re writing about. There’s some chance that person was paid to spread an opinion or an idea. Realizing this gives us power to question what these folks are saying and be agents of truth in these online communities. Always ignoring trolling leads to trolling leading the conversation. So dropping the occasional polite counterview when you see an obvious troll can make a real impact on perception. The positive perception of consumers and the public is what keeps steno in business.
The best part of all this? You can rest easier knowing some of those hateful things you see online about issues you care about are just hired thugs trying to divide us. If a comment is designed to hurt you, you might just be talking to a Russian operative.
I understand readers will be met with the Scientific American paywall. I would open myself up to copyright problems to display the entire article here. If you’d like to speak out against the abject tyranny of paywalls, give me money! I’m kidding.
I wrote some time ago about how I wanted to combine all my steno-related computer coding into one thing so that I could troubleshoot one thing instead of keeping track of multiple projects. This early version of the Stenonymous Suite contains the WKT test generator, the finger drill generator, something I call a streamer, that streams the text you tell it to stream at the rate you tell it to stream, and it also automatically marks .txt files for dictation. As those of you who have manually marked dictation know, it can take upwards of 10 minutes per marking. This program will mark it in about one second, and has saved me over 15 hours of manually marking dictations.
If you are a stenographic educator or dictation enthusiast, this program is totally free and has no strings attached, but I am also willing to put it through the marker program for you at a rate of 25 cents per marking, $5 minimum.
There are various types of learners. Some like to see things in print. Some like to watch videos. I’m a one-man shop, and can’t tailor everything to every learning type, but I do make it a point to try to be accessible and offer multiple solutions to a thing. I’ve got a video on this topic, but it makes good sense to have written instructions.
It’s easy. Take the WPM you want to mark. Let’s say 40 WPM. Divide that by 4. That gives you how many words you need to say every 15 seconds to hit 40 WPM. Often we indicate the 15-second markers with some kind of indicator, like slash marks ( // ), either manually or automatically. Then we read back the dictation, and every 15 seconds, make sure we hit the slash mark. Just keep in mind that this is for word count only. Standard dictation has an average syllabic density of 1.5 syllables. So a marked dictation for word count, for a 40 WPM should look something like the example below:
“There are several things that we must remind ourselves //from time to time.
Succinctly, we must remember that in //a great state like New York the right of the //jury trial is not absolute. In New York City a //person charged with a B misdemeanor can be forced to //trial by judge as opposed to trial by jury. This //trial by a judge is also called a bench trial.
//This can be confusing for a layperson like myself because //we are taught that in America a person must be //found guilty by a jury of his or her peers //before he or she may be convicted of a crime. //There’s no shame in holding this belief, as Article III //of the American constitution and the Sixth Amendment suggest that //one must be tried by a jury.
The Supreme Court //of the United States decided that the right to a //jury trial only pertains to serious crimes. Serious crimes are //defined in terms of jail exposure. If the potential jail //time is six months or less, a crime is not //serious and so does not need to be tried by //jury.
Even more fascinating is that the jail time is //looked at per offense. So if someone is charged with //and convicted of 21 B misdemeanors and sentenced consecutively as //opposed to concurrently, that person could theoretically go to jail //for 10 years without a trial by jury.
I think //that the best way to find out what the American //public think of this concept is to publicize it. Obviously, //all of this information is available publicly and can be //found easily in an internet age. The problem with nearly //infinite knowledge is that we take it for granted and //don’t challenge our beliefs to see how accurate or inaccurate //they may be.
As I said before, from time to //time, you should remind yourself that there are things that //you may believe or take for granted that are not //true, or not completely true. Ignorance certainly has its place //in life, and we cannot always search for every answer //all the time, but it is worthwhile, from an academic //and philosophical perspective, to question.
Question yourself. Question what you //believe. When you’re finished questioning all of that, question it //again. Great things can come from an inquiring, honest mind. //One does not need to be a genius in order //to innovate. One need only be reliable, persistent, and considerate //to become an agent of change in the local, state, //national, or even international communities.”
It’s really that simple. With a little time and effort, anyone can do it.
Seems like every day now there’s a new article talking about the great advances of AI transcription. Notice in what I just linked, the author is “Wire Contributor,” which to me means that it’s probably a Trint employee. The September 2019 article goes on to link an April 2017 article where the Wire apparently said something they did was unprecedented.
If you’re not looking at dates and glancing over it, it looks like AI transcription is making leaps and bounds. It’s coming. Their app is to be released at the end of 2019! What will we do? I am here to hopefully get everyone thinking critically. Why are these articles always sporting a technology that’s critically acclaimed but not ready to be publicly released? Because it’s a pitch. It’s an effort to get more investors. It’s a bid to get more people to throw money at it.
Not to get too controversial, but I’ve long watched a YouTuber scientist named ThunderF00t (Phil Mason). He’s made many videos to raise consumer awareness on products including inventions like the Free Electric, Solar Roadways, Zero Breeze, Fontus. All of these amazing things have a common theme: They sound cool. The media doesn’t understand the concepts behind them. Their creators make positive claims about them. These inventions have had millions of dollars put into them only for kickstarters and stakeholders to be let down. This is despite walls of positive press from various sites and media forums.
What can we learn? Sellers sell. That’s what they do. When there’s millions of dollars to be made, does the seller really care if the product only meets 90, 80, or 70 percent of the buyer’s needs? Will most buyers spend more time and money holding the seller accountable, or will they eat the loss or attempt to justify the purchase to themselves? That’s why you see claim after claim and never a bad word unless you have colossal levels of fraud, like Theranos. What else can we learn? These things can raise millions of dollars and never hurt a market, Solar Roadways raised over a million dollars and never threatened existing energy companies.
Buying hype can only serve to dampen our morale and make us cede market share. It can only serve to silence us. You don’t have to be a computer scientist to investigate claims about computer science. Let’s start selling facts and raising consumer awareness. If nothing else, remember: If their product worked, you would be using it.
So I’ve been following the facts on a series of cases picked up by the Batavian and Daily News. The very short story, with some extrapolation, is that a grand jury stenographer contracted by the district attorney was apparently using the AudioSync feature in our modern stenotypes. This caused the defense attorneys to seek dismissals of the indictments. As best I can tell, and after writing Batavian author Howard Owens and one of the attorneys, who had stated it was a Judiciary Law misdemeanor, I pieced together the following with regard to grand jury recording law in New York:
Criminal Procedure Law 190.25(4) makes it very clear that grand jury proceedings are secret. Judiciary Law 325 gets into how it shall be lawful for a stenographer to take grand jury proceedings, and doesn’t explicitly allow audio recording. Penal Law 215.70 talks about unlawful disclosure and lists the crime as a class E felony. Finally, Penal Law 110 tells us an attempted E felony becomes an A misdemeanor.
What can we further infer from all that? Well, as best I can tell, the indictments are only dismissed if it’s shown that the recording altered the testimony or proceedings in some way, and the defense is given the burden of proving that. As of writing, no indictment has been dismissed because of recording. That said, this opens up a serious concern for grand jury stenographers across New York. Recording the grand jury proceedings may be construed as attempted unlawful disclosure, and thanks to Judiciary Law 325, it may be difficult or impossible to argue that such recording is in the course of your lawful duties. Like Frank Housh in the video linked above, I was shocked that we could work in this industry for years and not ever be told the law surrounding that. Admittedly, I was a grand jury stenographer in New York City for months, and while I understood that not recording was a condition of my employment, I did not know that recording could theoretically give rise to a criminal prosecution. It is up to us to keep ourselves and each other informed, and now we know. This is not a joke, and you could go to jail for up to one year and have a criminal record for up to ten years on an A misdemeanor.
That caution stated, as of writing, there has been no prosecution of any grand jury stenographer for that specific reason, so it seems that the district attorneys or assistant district attorneys involved in these cases disagree with defense’s contention that this rises to the level of a misdemeanor. It also appears that recording of the proceedings does not automatically invalidate indictments.
The court rules Part 29 and Part 131 did not come up in my correspondence with anyone involved in this matter, but they are tangentially related and may be worth a review. And remember, nothing written here pertains to federal grand jury proceedings. We are talking strictly the New York State courts.
Any future updates to this matter will be posted right here.