It came to my attention some time ago that Verbit was using a real proceeding’s audio to test its potential transcribers. After entering one’s information, one will get to a screen that encourages him or her to download all the files and put together a transcript from the information and audio given.
But this just takes my criticism of digital reporting and Verbit to a whole new level. Anyone with access to the link from anywhere in the world can just pop on and download a bunch of files from somebody’s case. These files have been accessible since July 2021 that I personally know of, and these files were still accessible as of September 15, 2021.
The whole thing leaves me in a pretty tough position. I want to prove this is happening so that court reporters can warn the legal community. But just dumping the evidence onto the internet a second time will violate the parties’ privacy more than it has already been violated. With heavy redaction, though, we can go through the various files and get a good idea of it. Let’s start with the cover page. Just remember, the redactions were put there by me. In the actual files there are no redactions.
There’s a file labeled TAG, which appears to be the digital reporter or video operator’s annotations. If I am correct about that, this is a window into just how useless the annotations are for a transcriber.
There’s a file containing a notice of deposition. To limit the time spent redacting, I’ll offer up the first page only.
The “must read” file comes next. Since that’s created entirely by Verbit, it’s downloadable here.
Then there’s a Verbit guidelines page, which seems harmless enough. But it hilariously refers to a “USLS” manual. The file is literally named “redone for USLS,” which to me seems to be fairly good circumstantial evidence that Verbit has a connection to US Legal Support. Not only is US Legal potentially defrauding consumers by making bad claims about the stenographer shortage, they might be working with a company so ignorant of good court reporting practice that it posted a proceeding online.
For the sake of completeness, I went looking for a USLS Manual and I found a 2017 version. Interestingly enough, it reads very much like an employee manual and has very specific formats for jobs. Remember, common law employees are all about who has direction and control of the work. I would say that if US Legal is or was using a 150-something page manual to “train” its “freelancers,” those people are actually common law employees and US Legal probably should have been paying employment taxes for them. What a shame it would be if I uploaded that manual and someone let the IRS know there was potentially a failure to withhold those taxes.
Back to Verbit’s files, they offer a template, which is more or less a transcription of the audio file they’re asking transcribers to transcribe. It is the single greatest indictment against digital reporting I have ever seen. The reporter’s name, Hang Nguyen, is misspelled as Han. The term “court reporter” is spelled “core reporter.” There’s a missing apostrophe. There’s a zero in the word “point.” She asks them to state their appearance and how they’re attending, but somehow it’s transcribed as “state your up here.” There are so many errors that quite frankly I hope my reporting colleagues do not let this go and that they take the time to send this to their bar associations. I am quite sure there are stenographic reporters that make mistakes. I personally make mistakes. But this falls well within the territory of “way too many mistakes to be normalized and accepted by our justice system.”
There’s a Kentuckiana reporter worksheet that’s published by Verbit. It’s a pretty standard worksheet, so I will not bother to publish it here.
We get to the audio file, and it’s a 22-minute file. Given that this proceeding is a family court matter between two individuals, it’s not appropriate for me to republish, but again, it was available on the internet for months and being used to screen or train Verbit transcribers. It’s real testimony about a family court matter.
I set out to investigate whether permission had been granted to Verbit to publish these proceedings on the internet. In full disclosure, court reporters have shared audio in our field, but it’s usually a snippet of a word or sentence for clarification purposes and not large chunks of testimony with information that can identify parties. Now, I don’t really like Kentuckiana because of their pro-digital stance, but when I reached out, Michael McDonner seemed very reasonable and made it very clear, permission was not given to distribute this audio.
But what about the attorneys? Maybe John Schmidt said it was okay.
But perhaps Amber Cook had given permission?
I reached out to Hang Nguyen on LinkedIn but I got no response as of writing. I also reached out to Leor Eliashiv from Verbit. Predictably, there was no response. But at the very least, Kentuckiana made a commitment to demand the audio be taken off the internet after I told them where to find it.
For so long our institutions and businesses have been trying to find a way to say we are the superior product. Maybe the answer is to just show consumers what they’re really signing up for if they entrust the future of the legal record to companies like Verbit, tons of errors and potential breaches of privacy. We have to direct people to the many resources to learn stenographic court reporting, such as NCRA A to Z, Project Steno, and Open Steno. We have to get serious about educating consumers. Please consider a donation to Protect Your Record Project today. They have been pioneers and powerhouses in consumer awareness, and it is largely thanks to them that this article will reach thousands.
Within 24 hours after the posting of this blog the files were taken off of the internet.
And despite my attempts to alert them to the inaccuracy in July 2021, nobody could be bothered to correct the article. It’s still wrong as of September 14, 2021.
And just to make this really clear, it’s fact checkable using New York’s business search, which takes maybe 60 seconds. Verbit is a foreign corporation, meaning it is not based in New York.
This might seem like a minor thing, but it points to a larger problem. Media people are not bothering to fact check anything. They’ll go on and on about how the technology is great and new, and how this company is a unicorn valued at a billion dollars, but they’ll miss simple realities, like 85% of AI business solutions being predicted to fail. IBM Watson wasn’t the holy grail of ASR and IBM makes $70 billion in annual revenue. What are the chances that Verbit beat IBM with its $5 million in revenue? I’ll give everyone a hint. Vince McMahon’s theme song tells us exactly what chance they have.
And Verbit is not doing anything to correct false perceptions. They reposted their May 2020 article again on September 14, 2021.
Just for fun, let’s dive into the implications they list here, since it’s being published a second time as if it is still true.
1. The rise of non-compete litigation. I see no reason to believe that this is an accurate assessment. States like New York are banning non-competes under 75,000. Even our sitting President of the United States doesn’t seem to like non-competes very much. So it probably wasn’t true in May 2020 when the courts were closed and probably isn’t true now.
2. Courthouses are closed. True in May 2020. Not really holding true now.
3. Working from home culture. Stenographers adapted to this. There’s no edge to Verbit in that department.
4. High demand for lawyers. Can’t argue here. Our nation of laws needs lawyers, especially in rural areas.
5. Technology is key. They mention how lawyers that know how to send documents electronically and perform video conferences are more desirable. Is this surprising to anyone?
6. Fewer courtroom cases. Verbit has pointed to our stenographer shortage in the past as the casus belli requiring our replacement. If there are fewer courtroom cases, demand is lower than anticipated, and therefore stenographers can meet demand and the whole theme that we cannot has been a marketing farce.
7. Smaller law firms thrive. They’re writing this because smaller law firms have fewer resources to spend figuring out that the article is a sales pitch. Marketing is about how you make people feel. They want to make smaller law firms feel good and try Verbit.
8. New court reporting strategies. In May 2020, laws regarding oaths and the swearing-in of witnesses were changing to adapt to the pandemic environment. This has been a major debate in our field where some businesses ignore procedural rules while others zealously defend them. New York itself has fairly simple guidelines for depositions taken within the state, without the state, and in a foreign country. As page 32 of the Summer 2020 Vermont Bar Journal told us, this situation gets complicated. So it’s not a false statement they’re making, but this is an example of framing. “New” and “court reporting” are designed to make the reader feel like court reporting is changing. Our strategy is the same it’s been for a hundred years, stenotyping what you say while you say it. We just do it with better technology than we had in the 80s.
10. The rise of the remote deposition. Automatic speech recognition thrives via the remote work because the audio quality tends to be much clearer, assuming everyone’s connection is good. It’s a closed scenario where everyone is speaking into a microphone. By contrast, the stenographic court reporter can survive anything. Check out 25 seconds of one of my early freelance jobs and let me know how well automatic captioning does there. I was a 20-something year old kid next to a steam radiator. If I had not been taking notes on my stenotype, there’d be no legal record of the proceedings. Automatic speech recognition fails in court reporting for the same reason court reporters get stressed out at lawyers. We have to get every word. Sometimes they stick us in spots where it’s really hard to do our jobs. In today’s world we are occasionally looked down on for asking to change our seat or relaying that a situation is unreportable. We will be very upset if the legal field suddenly decides “yes, we can create the ideal hearing scenario for the computer that we couldn’t bother to do for the human beings we work with every day.” But my money is on one simple truth, people are people and most of them will never jump through hoops to make a computer “happy” when they can work with a live stenographic reporter who will jump through hoops to make them happy. It’s the same reason customers dread calling any kind of service center nowadays. Getting bounced around by an automated system has got to be one of the most infuriating experiences in modern life. Applying that to the legal record is a masterful level of stupid.
This isn’t anything new from Verbit. They put out questionable marketing materials all the time. They did it again in this undated webpage about digital reporting. Let’s put those “myths” to bed too.
But you know what’s screwed up? Here Verbit is calling digital court reporters highly trained, but not long ago, they were claiming that digital reporting required a workforce that is not highly trained. Again, this is a company with no conviction or facts backing it. It is a chameleon, ready to blend in with whatever way will make it money or sound good.
Let’s keep on reading some digital reporting myths.
AI never has a bad day? Well, in my October 2020 article, YouTube thought the caption for defeating the enemy and extinguishing his life was “to feed my enemy, I extinguish his wife.” In my June 2021 article YouTube AI thought “raise your right hand” was “rage right hand.” There’s two bad days right there. If Verbit’s got better ASR than YouTube, why haven’t they sold it to YouTube yet?
To understand why this is wrong, you have to know a little about the tech and concepts at play. Alexa and Siri are constantly able to learn your voice and tune to your voice. That’s like voice writing. In order to create a uniform ASR program that can get all English speakers all the time and automate that transcription, you need tons of data from all those speakers in all different types of environments. Since new people are being born every day and language is changing a little bit every day, this is basically hopeless. As written in Scientific American, ASR is not perfect and may never be. Just think criminal prosecutions. Does anyone really believe we are going to get defendants to sit there and help the court system train the computer to their voices? “Ah, yes, I think I will just assist the state in my prosecution.”
For anyone that hasn’t caught on, there is a pattern here. There is little substance, a lot of fluff, some great sales tactics, and no real court reporting knowledge. Perhaps most offensive is their reliance on quotes and ideas from the National Center for State Courts, which as far as I can tell just doesn’t like stenographers, since they continually call for digital recording despite some evidence that costs are similar and stenographers are more efficient. I hate to say that about NCSC since they seem to admire community court solutions as much as I do, but that’s where we’re at, they don’t like that my job exists.
I really feel for investors. They’re being recklessly encouraged to throw millions of dollars into something that, from any reasonable view of the facts, has a high chance of failing or stagnating. As I pointed out in my science article, they’re paying Kenyan transcribers maybe a fourth of what Americans are paid for the same work. Any alleged savings doesn’t go to the consumer, it goes to the company. Does the court reporting consumer want the creators of the legal record to be outside of his or her subpoena power? Does the captioning consumer want a company to push down prices so that captioners have a hard time affording continuing education? Is everybody really okay with what is apparently a zombie company coming in and sinking millions of dollars into Rev 2 under the false notion of “future technology?” Livne himself has admitted they’re “over-subscribed” when it comes to funding. It’s quite clear to me that they’re overfunded because they’re turning out to be an overblown transcription company and not the cutting edge of technology. After all, just compare their “over-subscribed” funding of maybe a couple hundred million dollars to the money pit of real AI research. When the media will admit that or when investors will catch on? That remains to be seen. But very much like US Legal, anything from Verbit needs to be viewed with extreme caution.
For investors looking for a stable return, consider getting involved with stenographic firms. Voice recognition and transcription has been identified as a market with billions of dollars in potential. Stenographers are the most efficient modality in that regard. Where technology companies will overpromise and underdeliver, the stenographic writer has worked out a system that has been going strong and evolving for over a hundred years. A Kentley Insights 2019 report showed a 10% profit as a percentage of revenue for court reporting businesses. As far as I am concerned, a far safer and more stable return is in stenography. If any investor wants to be directed to the more entrepreneurial minds of our profession, I am happy to direct. Please write me at ChristopherDay227@gmail.com.
I reached out to Jim McMillan from NCSC and I have to correct my above position on the organization. He explained that he believed quote Verbit used from him was from a 2013 post and that that was well before speech-to-text automatic speech recognition was close to usable. The position that NCSC takes tends to be on courtrooms that do not require the transcription of many matters. Obviously, I will always be an advocate for the stenographic reporter, but this is a far different take on it that I previously had and important for our field to see.
About three months ago, after Verbit’s acquisition of VITAC, a well-known captioning provider, I published a strategic overview for captioners and how they can stand up for consumers. Not long ago, a live steno captioner position was posted by VITAC for less than $20 an hour. The position did boast other incentives, such as the potential for health insurance and a 401(k) for full-time captioners. With health insurance being valued by sources like Griffin at $1.52 to $7.42 an hour, it’s fair to say that we can consider a $19.23 hourly rate with benefits a value of about $30 an hour at best and a value of $20.75 at worst.
Stenography is a highly specialized skill. But even other highly specialized skills, like realtime voice writing, were undervalued. The voice captioner posting said $30 hourly at the top, but then in the body of the description, a $17/hr training rate was advertised. It was further advertised that $35,000 could be made in the first year. $35,000 divided by 52 weeks in a year is about $673.08 a week. Assuming a 40-hour workweek, that’s about $16.83/hr — close to half the advertised rate!
I thought, “if a company is going to pay its specialized workforce $20 or $30 an hour, certainly I feel bad for the positions that do not have labor shortages or specialized skills.” Then I came across VITAC’s posting for Sales Engineer I (SE1). An SE1’s job is all about onboarding new clients and responding to requests from Operations and Sales personnel. They’re offered $58,000 to $70,000 annually, the equivalent of $27.88/hr and $33.65/hr assuming the same 40-hour workweek. So VITAC’s apparent strategy is to pay the stenographer that is providing the actual service to the consumer about 60% of what they’re paying the salespeople. But just to make sure they look good, they added a modern stenotype to the website.
Of course, having been in the field the last eleven years, I also have some basic familiarity with the rates that captioners and CART providers charge. $20 to $30 for a “live steno captioner” job seemed low to me. Knowing how companies in the court reporting sector have taken advantage of young reporters, I requested information from several service providers in the field with varying degrees of experience in the hopes that I could get solid info out there for young or unknowing captioners. This is what I learned:
Provider A stated that they did not provide broadcast captioning, but did caption telephone calls and Zoom meetings at a rate of “almost $40 an hour” through Innocaption. It was stated that the work was super easy and may even be possible for students to take, though Provider A did mention they usually do not recommend students work. Asked about their understanding of broadcast captioning rates, Provider A stated broadcast captioning was higher. Provider B stated “Even as a brand new CART provider, I never made less than $60 an hour. With one company, after I got my [certification], they bumped me to $65. Another company has always been $65 across the board. The third company has different rates for different jobs. Classes are $60 but if you are doing town halls, harder jobs, it is $75. Fourth Company was a smaller company and [they] paid me $80 per hour, and it was only classes. First company I spoke of is out of Illinois, second is Denver, third is California, fourth is Chicago. And I have never done broadcast captioning. I hope that helps!”
Provider C stated that they performed work for call services that did live captioning and were offered $40 an hour, but they were only taking down one side of a conversation.
Provider D, a 27-year veteran of our field and certified realtime reporter, stated that when they took on captioning work, it was 2014, they had a full-time job, and they did not need to make the same high rates independent contractors usually did. They made $50/hr in 2014 and a 2-hour minimum. That work came to a close. Come 2020, Provider D was again offered $50/hr and attempted to negotiate for $80 because the work was dense and contained a lot of science. The firm “did not know” if they could pay $80, and asked Provider D to come down to $70, which Provider D did with the caveat that they would renegotiate at a later date.
Provider D also received a call from a California-based company and negotiated $100/hr with a 2-hour minimum. The firm paying $100/hr expected no rough draft after events. The firm paying $70/hr required a rough draft. A third firm in Florida offered $80/hr. Provider D stated that the swing was generally between $50/hr to $100/hr and that they would never work for $20/hr because captioning is more than knowing realtime, you have to know how to connect to a multitude of platforms and devices, as well as troubleshoot on the fly.
Provider E wrote “My first response when I read [the $20 rate] was OMG! Yeah, that is SUPER low! So here’s what I know from where I sit in the Pacific Northwest:
There are four levels of captioning that I have ascertained. 1. Broadcast captioning, which is a whole other sphere that requires encoding software and usually above and beyond training to do TV captioning. I don’t really know much about that…” “I don’t know what rates they’re charging, but it has to be higher because the software is not cheap, like a $7k add-on with Eclipse.
2. CART captioning, either in person or remote, through a freelance company or own shingle. This is stuff like government meetings, group conferences, seminars and such, $120-$125/hr with 2-3 hour minimum in my area. We are sometimes requested to bring a projector and/or screen, which adds to rental fees. About half of people charge after hours rates on this. I feel the remote world has let this go a bit. But I know when I go back in person that’ll definitely go back in.
3. Schools. One on one with one student. they are notoriously cheap in my opinion even though they’re being paid by ADA funds, from my understanding. Most commonly in my area $85/hr, 2-hr min. But I’ve negotiated more for after hours and weekend work with one college.
4. There is one company whose name escapes me, probably more, who provide a captioner for phone calls. they only pay $30/hr. I was really bothered by this undercutting of the industry when I found out about the rates folks were accepting. But a reporter I talked to about it said [it’s] mostly sitting there doing nothing because you’re only writing half of the conversation, no transcripts, so super easy work. She considered it easy supplemental income.
That $20 is WAY out of line, especially if that requires continuous writing…”
Provider F wrote “everyone has their baseline. I will do $70 and hide my head, for a friend. But my default is $80 or $85. However, if it’s MY work, my clients, I charge 100 or 125 and pay $80 or $90 or $100 depending on the job…”
According to the Bureau of Labor Statistics inflation calculator, $50 in 2014 money is worth $58.08 in June 2021 dollars. $100 in 2014 money is worth $116.15 in June 2021 dollars. Again, for new captioners, this should put into perspective the value of the work and the importance of occasional raises.
Thank you for your question about our company. StenoCaptions LLC is proud to be a minority woman-owned business. Our team of independent contractor captioners earn between $100-120 per hour depending on their qualifications and length of time in the field. As our website discloses, we charge $140 per hour for most jobs. This means that our captioners, who are the people doing the difficult and demanding work of providing live accurate Communication Access Real-time Translation, net between 70-86% of what we bill. StenoCaptions LLC is proud to support our highly trained, highly reliable stenographic captioners.
We are happy to be quoted on your blog. Let us know if you have any further questions.
Sincerely, Wendy Baquerizo and Joshua Edwards Co-owners StenoCaptions LLC StenoCaptions.com”
As of writing, there is little doubt in my mind that the rates being offered by VITAC, and I suppose by extension Verbit, are well below what could be considered a market rate no matter which market in the United States we examine. Again, in the best-case scenario of a $30/hr value, they are paying 40% less than Provider D, whose full-time job was not captioning, made in 2014! A company like Steno Captions is literally paying six times as much to their providers. This has some troubling implications. Verbit’s entire model, as I understand it, is automatic speech recognition transcription coupled with a human transcriber. Verbit claims on its site that after 8 hours it can provide ADA-compliant material at 99% accuracy, at least that’s how I understand their infographic. They also make the claim of 95% accuracy with an 8 to 12-second delay.
We have to deal with the hard fact that, in its series A funding, Verbit made the claim that its “adaptive speech recognition tech” could generate detailed transcriptions with over 99 percent accuracy at record speeds. In its series B funding, Verbit, through CEO Livne, said it would not take the human transcriber out of its workflow. Now it’s apparent that Verbit regards “record speeds” as 8 hours. We have to deal with the hard fact that, when studied by people at Stanford, an entire host of automatic speech recognition products from companies far larger than Verbit had accuracy levels that were 25 to 80 percent dependent on who was speaking.
There’s just no good reason to believe that Verbit consistently has the capabilities that it says it has. This is all part of the claim game that I demonstrated earlier this year. In the video I just linked, I tell six lies, one partial truth, and one actual truth in fifteen seconds. I challenged my readers to think about how long it would take to prove the truth or falsity of each claim. I have to make the same challenge here. Verbit’s website boasts that they are trusted by “400+ organizations,” but when one flips through the organization list, one sees about 16 organizations. Even if one wanted to spend the time and energy to fact check the claim of being trusted by 400 organizations, one could not do so. Why bring it up? Because stenographers need to be aware that a lot of the “intimidating” information out there falls apart when given any sort of investigation. Likewise, there are entities out there that will try to convince young captioners that their skill is not worth very much. I’m publishing this information today to counter that.
Perhaps the low pay wouldn’t bother me, but it goes directly against digital recording’s main talking point of “we need to record it because there are not enough stenographers to meet demand.”
Maybe the shortage of stenographic court reporters and captioners is exacerbated by companies like this coming in and offering pay that’s nowhere near the market rate. There’s no innovation involved. It’s a shameless war on workers. It doesn’t take a particularly bright person to say “gee, there would be more money for the company if only we could reduce the labor costs.” It also doesn’t take a particularly bright person to point out to captioners that they cannot accept this if they want a healthy field. We’re going to need the entrepreneurial individuals among us to consider jumping in, setting up shop, and competing. We’re going to need captioners to demand the pay they deserve. So if you come across an inexperienced reporter getting told they’re only worth $20/hr, please share this with them and be a major part of pushing back.
Addendum: I realized after my initial draft that the $20 an hour could be a full-time job. Assuming 7 hours a day, five days a week, 52 weeks a year, that’s a salary of about $36,400, below the national average, and well below what I started working for as a court reporter around $70,000 a year. So even looking at it from the standpoint and potential of “more hours for less pay” I am unimpressed and captioners should be too.
Verbit’s constant attraction of investor money and recent acquisition of VITAC has set off a few optimistic waves in the media. Verbit bills itself as a unicorn, that is, a startup with a valuation of over $1 billion. Court reporters worried about that kind of classification should be aware that it means nothing. Fyre was set to be a unicorn, and yet Billy McFarland’s venture did little more than light Fyre’s investors’ money on fire and land him in jail. Theranos was valuated at $10 billion. It’s now worthless and Elizabeth Holmes may face jailtime. Powa was a unicorn with a valuation of $2 billion. That didn’t work out either.
I was shocked to come across an article that states that there are hopes of Verbit becoming a publicly-traded company by 2022. Ignoring things that the article gets incorrect, such as the firm being Manhattan-based (other articles state it’s an Israeli company), there are few reasons I can see for Verbit to become a publicly-traded company. Becoming publicly traded would allow investors to see the profit or loss. To give a great example, VIQ Solutions, parent of Net Transcripts, is publicly traded. It loses money every quarter despite reporting revenue in the millions. Companies that lose money aren’t attractive to investors and that’s why VIQ is about $7 a share today. Remaining private allows companies to continue a kind of “shell game” and operate despite being unprofitable. Based on Livne’s broadcasting of Verbit’s revenue and silence with regard to profit, I suspect Verbit would have the same exact problem, lots of revenue and little or no profit.
Going public would serve only one purpose in my view, an exit for current investors. Current investors could make a big deal about how it’s a company valuated at over $1 billion sitting “on top” of a market that’s allegedly worth $30 billion and watch as new investors dive in and take the bait. In the article above, Livne states the funding rounds were over-subscribed. That means they had a lot of money poured on them in their funding rounds that they did not need. If they’re over-funded, going public clearly wouldn’t provide the company with funds it needs — remember, it’s overfunded — again, it would give the current investors an exit. They get to cash out, some suckers get to buy in, and what happens after that is anybody’s guess. It remains a little strange to me that journalists buy the idea that a company that is maybe half a decade old has automatic speech recognition technology that is better than basically all the major players in the market. Those major players, according to one study, have accuracy levels between 25 and 80 percent.
My prediction is that Verbit will either fail to go public in 2022, or it will go public and take a hard fall sometime down the road after Livne and other investors have cashed out. I sincerely hope it’s the latter, because at least it would be a happy ending for the founder. Verbit finds itself in a precarious position of being a large target for the IRS. In the United States, one is a common law employee when the “employer” has direction and control. Verbit, according to the linked article, is using 30,000 freelancers to carry out its business model. If Verbit does not have direction and control, it cannot assure quality. If it does have direction and control, those are 30,000 employees it is failing to withhold taxes for. Like other companies that rely heavily on independent contractors, Verbit may soon find itself under attack from federal and state tax authorities where it conducts business or earns income. Anyone in the world can confidentially file a form 3949-A that puts Verbit under the spotlight, and that can only translate into headaches for the company and its investors. With that kind of exposure, I would not be investing in the company any time soon.
Even in a world where authorities turn a blind eye and there isn’t a decline in the company’s financial health, Verbit moving public could only give its competitors more information, which is something I’m looking forward to.
With the news that Verbit has bought VITAC, there was some concern on steno social media. For a quick history on Verbit, it’s a company that claimed 99 percent accuracy in its series A funding. In its series B funding it was admitted that their technology would not replace the human. Succinctly, Verbit is a transcription company where its transcribers are assisted by machine learning voice recognition. Of course, this all has the side effect of demoralizing stenographers who sometimes think “wow, the technology really can do my job” because nobody has the time to be a walking encyclopedia.
But this idea that Verbit, a company started in 2016, figured out some super secret knowledge is not realistic. To put voice recognition into perspective, it’s estimated to be a market worth many billions of dollars. Microsoft is seeking to buy Nuance, the maker of Dragon, for about $20 billion. Microsoft has reportedly posted revenue over $40 billion and profit of over $15 billion. Verbit, by comparison, has raised “over $100 million” in investor money. It reports revenue in the millions and positive cash flow. Another company that reports revenue in the millions and positive cash flow? VIQ Solutions, parent of Net Transcripts. As described in a previous post, VIQ Solutions has reported millions in revenue and a positive cash flow since 2016. What’s missing? The income. Since 2016, the company hasn’t been profitable.
Obviously, things can turn around, companies can go long periods of time without making a profit, bounce back, and be profitable. Companies can also go bankrupt and dissolve a la Circuit City or be restructured like JCPenney. The point is not to disparage companies on their financials, but to give stenographic captioners real perspective on the information they’re reading. So, when you see this blurb here, what comes to mind?
Hint. What’s not being mentioned? Profit. While this is not conclusive, the lack of any mention of profit tells me the cash flow and revenue is fine, but there are no big profits as of yet. Cash flow can come from many things, including investors, asset sales, and borrowing money. Most of us probably make in the ballpark of $50,000 to $100,000. Reading that a company raised $60 million, ostensibly to cut in on your job, can be pretty disheartening. Not so once you see that they’re a tiny fraction of the overall picture and that players far bigger than them have not taken your job despite working on the technology for decades.
Moreover, we have a consumer protection crisis on our hands. At least one study in 2020 showed that automatic speech recognition can be 25 to 80 percent accurate depending on who’s speaking. There are many caption advocates out there, such as Meryl Evans, trying to raise awareness on the importance of caption quality. The messaging is very clear: automatic captions are crap (autocraptions), they are often worse than having no captions, and a single wrong word can cause great confusion for someone relying on the captions. Just go see what people on Twitter are saying about #autocraptions. “#NoMoreCraptions. Thank you content creators that do not rely on them!”
This isn’t something I’m making up. Anybody in any kind of captioning or transcription business agrees a human is required. Just check out Cielo24’s captioning guide and accuracy table.
If someone’s talking about an accuracy level of 95 percent or better, they’re talking about human-verified captions. If you, captioner, were not worried about Rev taking away your job with its alleged 50,000 transcribers, then you should not throw in the towel because of Verbit and its alleged 30,000 transcribers. We do not know how much of that is overlap. We do not know how much of that is “this transcriber transcribed for us once and is therefore part of our ‘team.'” We do not know how well transcription skills will fit into the fix-garbage-AI-transcription model. The low pay and mistreatment that comes with “working for” these types of companies is going to drive people away. Think of all the experiences you’ve had to get you to your skill level today. Would you have gotten there with lower compensation, or would you have simply moved on to something easier?
Verbit’s doing exceptionally well in its presentation. It makes claims that would cost quite a bit of time and/or money to disprove, and the results of any such investigation would be questioned by whoever it did not favor. It’s a very old game of making claims faster than they can be disproven and watching the fact checkers give you more press as they attempt to parse what’s true, partially true, and totally false. This doesn’t happen just in the captioning arena, it happens in legal reporting too.
This seems like a terrifying list of capabilities. But, again, this is an old game. Watch how easy it is.
It took me 15 seconds to say six lies, one partial truth, and one actual truth. Many of you have known me for years. What was what? How long will it take you to figure out what was what? How long would it take you to prove to another person what’s true and what’s false? This is, in part, why it is easier for falsehoods to spread than the truth. This is why in court and in science, the person making a claim has to prove their claim. We have no such luxury in the business world. As an example, many years ago in the gaming industry Peter Molyneux got up on stage and demo’d Milo. He said it was real tech. Here was this dynamically interactive virtual boy who’d be able to understand gamers and their actions. We watched it with our own eyes. It was so cool. It was BS. It was very likely scripted. There was no such technology and there is no such technology today, over eleven years later. Do you think Peter, Microsoft, or anybody got in trouble for that? Nope. In fact, years later, he claimed “it was real, honest.”
Here’s the point: Legal reporters and captioners are going to be facing off with these claims for an indeterminate amount of time. These folks are going to be marketing to your clients hard. And I just showed you via the gaming industry that there are zero consequences for lying and that anything that is lied about can just be brushed up with another lie. There will be, more or less, two choices for every single one of you.
Compete / Advocate. Start companies. Ally with deaf advocates.
Watch it happen.
I have basically dedicated Stenonymous to providing facts, figures, and ways that stenographers can come out of the “sky is falling” mindset. But I’m one guy. I’m an official in New York. Science says there’s a good chance what we expect to happen will happen and that’s why I fight like hell to get all of you to expect us to win. That’s also why these companies repeat year after year that they’re going to automate away the jobs even when there’s zero merit or demand for an idea. You now see that companies can operate without making any profit, companies can lie, much bigger companies haven’t muscled in on your job, and that the giant Microsoft presumably looked at Verbit, looked at Nuance, and chose Nuance.
I’m not a neo-luddite. If the technology is that good, let it be that good. Let my job vanish. Fire me tomorrow. But facts are facts, and the fact is that tech sellers take the excellent work of brilliant programmers and say the tech is ready for prime time way before it is. They never bother to mention the drawbacks. Self-driving cars and trucks are on the way, don’t worry about whether it kills someone. Robots can do all these wonderful things, forget that injuries are up where they’re in heaviest use. Solar Roadways were going to solve the world’s energy problems but couldn’t generate any energy or be driven on. In our field, lives and important stakeholders are in danger. What happens when there’s a hurricane on the way and the AI captioning tells deaf people to drive towards danger?
Again, two choices, and I’m hoping stenographic captioners don’t watch it happen.
I had a lot of fun writing the Verbit investors article. But the more I explore opinions and ideas outside our steno social circles, the more I see that most people totally don’t get stenographers or the work we put in. A lot of us have had sleepless nights trying to get a daily out, time lost for ourselves or our families trying to do the job we signed up for, or some amount of stress from someone involved with the proceeding being unhelpful or antagonistic. It happens, we take it in stride, and we make the job look easy. So it doesn’t surprise me very much when people say “why not just record it?” It doesn’t surprise me that investors threw money into the idea that technology could disrupt the court reporting market. But I can only hope that proponents of digital really take the time to understand and step back from the cliff they’re being pushed towards.
For this exercise, we’re going to be exploring Verbit’s own materials. They recently circulated a graphic that showed the “penetration” of digital into the court reporting market. It shows 5 to 10 percent of the deposition market taken by digital, and 65 to 75 percent of the court market taken by digital. It also notes that only 25 to 35 percent of courts are digitally transcribed. I take this to mean that while they have 75 percent of the “court market,” they only transcribe about 25 percent of it. This is a massive problem. So the technology, when it’s not breaking down in the middle of court (29:20), is ready to record all these proceedings. But you only have the capacity to transcribe about a third of that. So in this magical world where suddenly you have every deposition, EUO, and court proceeding, where are you going to get all of these people? We’re talking about multiplying your current workforce by 28 assuming that every person you hire is as efficient as a stenographer. And the math shows that every stenographer is about as efficient as 2 to 6 transcribers. So we’re really talking about multiplying your current workforce by 56 to 168 times, or just creating larger backlogs than exist today. By not using stenographers, Verbit and digital proponents are setting themselves up for an epic headache.
Of course, this is met with, “well, there’s a stenographer shortage.” But what you have to understand is that we’ve known that for seven years now. All kinds of things have happened since then. You’ve got Project Steno, Open Steno, StenoKey, A to Z,Allison Hall reportedly getting over a dozen school programs going. Then you have lots of people just out there promoting or talking about the field through podcasts, TV, and other news. Showcasing the shortage and stenography has brought renewed interest in this field, and we are on track to solve this. Again, under the current plan, you would need as many as 60,000 transcribers just to fill our gap, and the turnover will probably be high because the plan promotes using a workforce that does not require a lot of training. So if you’re talking about training and retraining 60,000 people again and again over the next decade, I am quite sure you can find 10,000 or so people who want to be stenographic court reporters.
Look, I get it, nobody goes into business without being an optimist. But trying to upend a field with technology that doesn’t exist yet is just a frightening waste of investor money. How come when you sell ASR, it’s 99 percent accurate, but when Stanford studies the ASR from the largest companies in the world, it’s 60 to 80 percent accurate? How come when you sell digital it’s allegedly cheaper and better, but when it’s looked at objectively it’s more expensive and comes with “numerous gaps and missing testimony?” These are the burning questions you are faced with. There’s an objectively easier way of partnering with and hiring stenographers. If you don’t, you’re looking at filling a gap of 10,000 with 60,000 people, or multiplying the current transcription workforce of 50,000 by 56 (2.8 million). In a world of just numbers, this sounds great. Three million jobs? Who wouldn’t want that? But not far into this experiment you’ll find that people don’t grow on trees and the price of the labor will skyrocket unless you offshore all of the work. What happens when attorneys catch onto the fact that everything is being offshored and start challenging transcripts? Does anyone believe that someone in Manila is going to honor subpoenas from New York? Again, epic headache.
So if I could get just one message out to Verbit leadership and all the people begging for us to “just accept technology,” it would be to really re-examine your numbers and your tech. The people under you are going to tell you that a new breakthrough is just around the corner, that things are going well, and that you shouldn’t worry. But you should worry, because you very well might find yourself a pariah in your industry like Peter Molyneux ended up in his. If you’re not familiar, Peter became famous for promising without delivering. One of the most prominent examples of this was 2009 E3, where he stood up on stage and introduced Milo. This tech was going to be interactive. It was going to sense what you were doing and respond to it. It turns out it was heavily scripted, the technology did not and still does not exist to do what was being talked about and presented to consumers. Now, anyone with a bit of sense doesn’t listen to Peter.
If the ASR tech worked, why not sell it to us at 10,000 a pop multiplied by the 25,000 stenographers in your graphic and walk away with a cool 250 million dollars? It does what we do, right? So why aren’t we using it? Why aren’t you marketing it to us? It’s got to be a hell of a lot easier to convince 25,000 stenographers than it is to convince 1.3 million lawyers. Sooner or later, Legal Tech News and all the other news people are going to pick up on the fact that what you are selling is hype and hope. So, again, consider a change of direction. Stop propping up STTI, shoot some money over to the organizations that promote stenography, and partner up with steno. You’d be absolutely amazed how short people’s memories are when you’re not advocating for their jobs to be replaced with inferior tech. Take it from somebody who’s done the sleepless nights and endless hours in front of a monitor transcribing, this business isn’t easy. But if you trust stenographers, we’re going to keep making it look easy, and we’re going to make every pro-steno company a lot of money.