A Word on AI and Stenography

I’ve said this before, but it feels like AI is ubiquitous and in everything these days. It spreads a lot of bad press for us stenographers in that people believe we are or will soon be replaceable. We can further extrapolate from the Pygmalion effect that those beliefs impact reality.

As many know, I’m an amateur programmer. I know relatively little about the top-of-the-line tech and can only code on a very basic level. That said, the more I learn conceptually, the more I’m in awe of just how far computers have come, and how far they have to go. You see it every day on your smartphone and in your steno software. Computers are hard at work and designed to do amazing things.

Here is the thing about computers: They only do what you tell them to do. You have to come up with a set of instructions, an algorithm, that gets it from point A to point B. They solve problems, but only using the instructions you give them. Even if you come up with the instructions, the results can be useless. We can imagine problems as mathematically solvable and insolvable — finite or infinite. An example of an infinite problem is a Fibonacci sequence. You take the next number in the sequence, and you add it to the last number in the sequence. This stretches into infinity. You can easily write a program to generate Fibonacci numbers, and the computer would die before generating them all because there are infinite numbers.

Then there are solvable problems. Chess is considered a solvable problem because it is a game with a finite number of pieces, spaces, and moves. There’s a problem, though. There are so many moves in chess that just the datasets for having 7 pieces at the end of the game (Lomonosov tablebases) are said to be 140 terabytes of information. To put that into perspective, it’s been estimated that all the books in the world would fit on about 60 terabytes. Even if you had a supercomputer capable of generating every possible move in Chess, the information would be absolutely useless to you, because to digest all of it would be the equivalent of reading every book ever written thousands of times.

So let’s think of AI and audio in terms of problem solving. The most basic way to describe Alexa and Siri is that they listen to you for keywords, and check what you say against their database, and decide what to do based on that algorithm we talked about. Let’s face it, there are only maybe 200,000 words in the English language. You could store every single one as a large audio file with less than 700 GB. Here is the deal: computers don’t hear in the traditional sense. They’re taking what you say and presenting educated guesses based on all the data they have. So now, if you will, imagine all 200,000 English language words and every combination they could possibly be in. To put it in perspective, it is a way bigger number than this. Now let’s add all the different ways words might be said, or all the different scenarios that might interfere with how the computer is “hearing.” Let’s add all the different accents and dialects of English.

Let me say this: It is very likely, in my mind, that someday computers will be programmed to hear as well as stenographers in any given situation. It’s a solvable problem. It’s a winnable game. But right now, based on what I know, there’s an indeterminate amount of time and money that it’ll take to get to a point where it is perfect and 95 percent or better in most or all scenarios in a reasonable amount of time. Take for a moment the example of Solar Roadways. Pave the roads in solar panels to solve America’s energy crisis. Millions of dollars were poured into this solution, and it failed. Remember, solvable problem, winnable game. Finite number of people with finite energy needs. Failed anyway. Speech-to-text is estimated to be worth billions of dollars. But what if it takes 100 more years to solve? How many millions or billions of dollars need to be lost before the solution is declared “good enough?” Remember, they can sell Alexa and Dragon today for piles of money. They don’t need 95 percent. The exponential growth of computers has ended, and unless the experts bring us quantum computing or some other huge leap in technology, we’re looking at computers being more money to upgrade.

Those companies you see that are touting transcription AI in 2019 are doubtlessly having transcribers fix AI-prepared transcripts at best. Their game is psychological. It’s not cost saving, it’s cost shifting from the worker to the boss. That’s why it’s not being sold to the public. It’s a magic trick. Look to the left while the magician rolls the coin to the right. It is in our best interest as stenographers to call this out when appropriate, and continue to bolster our own magic skills and industry as the go-to for the hearing impaired and legal communities. Could some geniuses come along and program your replacement next year? Sure. But one thing that you should understand is that it’s not very likely, and buying the hype before they have a product to sell is only going to hurt our morale and livelihoods. We have our method. We have a product. We’ve got more brains, voters, and history in the field. So do yourself and all of us a favor, don’t buy the hype, and the next time you meet a transcriber working for Fake AI Transcription Corp, LLC, tell them they can double their earnings and better themselves by joining the stenographic legion. If a supercomputer is required to solve Chess, what do you believe is required to get automatic speech recognition to 95 percent?

May 26, 2019 Edit:
I should add that it’s obvious computers are becoming ruthlessly good at transcribing one speaker, especially in a closed or suitable environment. There are hours of video on that. It’s introduction of multiple speakers in a less-than-perfect environment where the thing struggles, probably because of all those mathematical issues talked about above.

June 18, 2019 Edit:

A post recently made its rounds on social media claiming a computer science PhD couldn’t see the perfect transcription coming out any time soon. It stands in stark contrast to the claims of some that the technology is already perfect.

August 17, 2019 Edit:

Another article came to light showing that Facebook Messenger and other automatic transcription apps are actually using human transcribers behind the scenes. Using my amateur knowledge of computer coding, I can say this is clear evidence that they need data (the transcriptions) to feed into the machine learning algorithms. Further, if they’re not paying their transcribers exceptionally well and bad data is being inputted, it could ultimately make automatic transcription programs worse. Expect some pretty big delays on the AI transcription front.

August 25, 2019 Edit:

I had created a “mock voice recognition video” just to prove how easy it would be for a company to lie about its voice recognition progress. I coded a computer program that spits back whatever text you give it at a set words per minute. So next time you’re at an automatic transcription demonstration, ask yourself if what you’re seeing is automatic or staged. I often give the example of Project Natal and Peter Molyneux. Gamers were made to believe that the Milo demonstration of Project Natal was a showcase of technology that was coming out. The truth broke years later that the demonstration was heavily scripted, and over ten years later, no such technology exists. Similarly, when someone tells you that their audio transcription program is flawless — question whatever you’re seeing and realize how easy it is to stage and sell things.

The Audio Sink

We’ll try not to pontificate too much beyond the title, but it’s time to jump right into discussion on Audio Sync technology. For a quick overview to newbies, the aptly acronym’d AS is basically an audio recording contemporaneously taken with your stenographic notes that allows you to jump to that place in the audio where your notes were taken.

It’s a wonderful tool that’s revered by newbies and seasoned reporters alike. It’s a great thing. It was impressive when it came out and remains an impressive feat of technology today. All that acknowledged, it’s time to put out some caution for the newbie or seasoned writer that utilizes it. Many will have seen these ideas or perhaps assume everyone already knows these things. We’ll assume the weakest link doesn’t and strengthen the chain.

First thing is first, if you’re going to use it, it’s not good to rely on it. Computers are funny. Sometimes they appear to be recording but aren’t. Sometimes they’re recording so much background noise it makes the audio useless. Sometimes you, the operator, forget to turn on the mic. It can be beneficial to pretend you do not have it. As saying goes, if you didn’t hear that answer, don’t assume the microphone did.

It can be beneficial to take jobs without it for three reasons. Firstly, it gives you an accurate idea of where you’re at. If you need a repeat every few seconds it feels awful, but it gives you an honest understanding that when you find some time, you need to work on that speed, or work on that particular accent, or improve whatever is going wrong within your control. There are resourceful tricks we often only come up with if we are forced to get it and do not allow ourselves to “let the audio catch it.”

Then there is also a boon to your wallet. If you rely on audio, then you listen to the entire deposition over, and it can literally double or triple your transcription time to listen to something more than once. Time is money, and very few of us have time to spend listening to every job over. Learning to read misstrokes and getting to glide from word to word will save you time and money in the long run. In the short run, you can also listen to music while transcribing.

If you’re planning on taking an employment test, the ability to walk into a job without audio is priceless. Your transcription skills and on-the-spot resourcefulness will be as sharp as it gets. You will have the ability to cope with getting it under pressure.

In the view of many, AS has done wonders for the field, but also hurt us badly. We graduate at 95 percent accuracy. Many of us go on to let the audio catch it, resulting in lower accuracy, longer transcription times, and tougher times passing examinations for certification or employment. This isn’t to ostracize those among us that use it or even rely on it, but to encourage that occasional job where you shut it off and let yourself develop skills in polite interruption and writing resourcefulness that this generation of reporter just hasn’t had to develop.

Veritext Buys A Diamond

In a perhaps not-so-surprising move Veritext bought Diamond. I wish every reporter a great deal of luck and success, but I do want to talk a little bit about why I think this is overall bad for us.

Corporations are entities made to create a profit for their owners. That’s their legal and primary purpose. There’s nothing really wrong with this, it’s kind of how things work. When you buy a stock in a public corporation, generally you can rest assured that the Board of Directors has a duty to protect the value of your shares. Yay.

But this poses a unique problem for reporters. Their duty is to their bottom line. What’s one of the biggest expenses? Labor. What’s labor in reporting? Our fees! So ultimately, Veritext, which I now nickname Gobbler Corporation, has bought its way into having what I imagine to be a pretty hefty book of business. This is bad for the following reasons:

If the reporter shortage continues, they have an incentive to push audio recording. It is cheaper and it will always be cheaper to get someone to take notes during a proceeding while it’s being recorded than hiring a stenographic reporter. This savings isn’t likely to be transferred to the lawyers and litigants, but added to Veritext’s bottom line.

If the shortage does not continue, Veritext has a larger market share of New York and will have a better ability to dictate prices to its reporters.

Honest solutions? We need to be better on our information game. We need to keep instructing reporters on what we are worth and encourage them to be powerful entrepreneurs. I’ve written before in this blog about how people can negotiate or seek information on government contracts. Perhaps soon I can write about becoming an NYC Vendor. Now is the time! More than that: We need to start fighting harder. As they start shifting to recorders, resist. Call up your favorite law firm and offer your services. Become the competition. Make them buy you out too. Reach out to law firms and tell them, hey, they’re cutting us out, and they’re not passing those savings to you, so hire a stenographic reporter today for a better deal!

This is the best damn time to be a reporter that I’ve seen in New York. The court system wants you. The unions want you. The association wants you. The agencies want you. Your skills are in real demand. But your willingness to step out of your comfort zone and really connect with customers, clients, lawyers, and the end users of our services really can alter how everything plays out. What you do actually makes a difference. Why? Strategy. Envision the whole thing as a game of chess. In Chess, if you refuse to move, you concede the game. Most of us are not wealthy, can’t concede and stop working. If you let the other player take all your pieces off the board, the sources you rely on for work, pulling off a win grows ever more challenging. If you start making moves, you force the opponent to react. Their game gets thrown because they can’t account for every move you make. Every dollar an entity gets is a dollar that makes them stronger. What do you think happens if the hundreds of stenographers in the city start taking dollars away by being real competition?

And we’re bothering people that want stenography to fail big time. The fact that we’re catching on and creating a plan to fight back is hurting them so bad that they’re gloating at me in anonymous e-mails about how our days are numbered.

So the choice is simple. Concede and let the current shotcallers decide how things are going to go, or step it up and take the time to read about how to draft responses to city RFPs (requests for proposals) and become true entrepreneurs, and introduce true competition to a needy, living market. Remember that a market is not just “oh, they want to pay me this”, but an amalgam of buyers and sellers, all seeking the best deal for themselves. Remember that as a provider you are the backbone of the market, and it’s your action or inaction that dictates tomorrow.

Veritext bought a Diamond. There’s no reason we can’t build ten more.

The Good Reporter Fallacy

I’ll just come out and say it. There are folks among us that think speech recognition technology is going to beat court reporters. I’ll even go so far as to say I personally believe that the technology will eventually do what we do.

But first on the issue of technology: Read what they aren’t saying. The technology is 95 percent accurate! But what was the setting under which it was accurate? Was there an air conditioner blowing overhead? Was someone printing directly behind the recording machine? Were there people speaking over each other? Did the computer accurately designate who spoke? Was the computer able to handle an unidentified speaker? Were there multiple speakers at different distances? Did the test take place in rooms of various sizes and acoustics? Yes, it is my sincere and honest belief that someday technology will be there to seamlessly do all of this, but there is no telling when. Until it is there, it is smoke. They’re blowing smoke just like everyone else who wants to sell a product. When it is there, we still fight to keep the jobs we have.

And that’s the topic of today’s missive. So some believe we’ll be overtaken by technology, and they are saying: Do not invite people into this dying field. That makes sense if you take the fact that it is dying as true and completely irreversible. Our current crisis in the reporting world is a reporter shortage, and their answer is: Don’t do anything and ride this career to the bitter end. There will be jobs for the good reporters.

But what is that good reporter? A realtimer? There are more non-realtime jobs than realtime jobs. Even today, there are more non-realtime jobs. So even if everyone is a good reporter tomorrow, there aren’t enough jobs for you if we give up that non-realtime work. Sorry. It’s a delicate balance. Reporter shortage means it’s easier for customers to swap to recording because there simply aren’t enough of us to meet a demand. Reporter glut means we suffer because high supply generally means lower cost (wage).

Well, right now, at this second, we are facing a shortage, and in the great wide world of life, we have better chances if there are more of us. Consider NCRA’s old strength of what I’m told was 30,000 reporters versus today’s — whatever — 15,000. That was literally double the budget to fight for reporters. Double the constituents when politicians ask how many people they are representing. Double pretty much everything.

So you can get up and introduce this field to somebody and be a part of ending the shortage, or you can sit it out and see what happens, and we can be friends either way, but I think it’s best to act. It’s very simple statistically: Can’t win if you don’t try. People play lotto on that same principle, so isn’t a shot at saving thousands of careers worth trying too? Look at politics. When your preferred politician or proposed legislation fails, do you just drop everything and say “I support this because it’s happening.” Maybe, but according to my Facebook, not likely!

If you’re an average person, you matter. History was built on average people. Armies are built out of average people. Battles were won when average people got the enemy army to route. The computer technology we use started with overall average programmers using punch cards to give computers simple instructions!

If you’re above average, show us. We average people want great leaders. We want problem solvers and talented people to look up to. There’s a market for greatness and a world of ways to uplift people. What if someone could design a steno program that got someone out in months, not years? What if someone could design a political campaign capable of sustaining our jobs even when the technology does what we do?

Be a creator. Be an inventor. Be an innovator. Support the people fighting for you. Support the people around you. Support yourself. Do what people say can’t be done. Be a winner. And remember, besides thermonuclear war, there are few times you can win by not playing.

Audio Transcription, Pricing, And You

First and foremost, happy Thanksgiving. As with most great writers, I’m going to take the time away from preparing to the holiday to write about something I know everybody will want to read about: Audio transcription and pricing. As stenographers, we tend to get very focused on a per-page pricing structure. This often leaves us trying to measure our time by pages, and is not always the most ineffective way of being paid.

For purposes of this post, let’s talk a little about CART, audio transcription, and pricing generally. CART and audio transcription are not the same thing, but they have similarities. One key similarity is that they tend to charge by the hour. For CART it’s per hour of writing, usually with a set minimum, and for audio transcription it’s money per hour of audio, sometimes prorated for audio that doesn’t last a whole hour or end exactly on an hour.

Succinctly, for CART, captioning, and audio transcription, despite having different prerequisite skills, the pricing for all of them must take into account the amount of work we’re doing, the quality of the work we’re doing, and ultimately the time it will take us to do the work. So speaking strictly for transcription: I’ve guesstimated that it takes me approximately one to two hours for every hour on the machine to transcribe with pretty close to 100% accuracy. That means for every hour of audio, there are about three hours of actual work involved. So, for me, honestly, working for less than $30/hr becomes painful, so the transcription deal isn’t sweet until maybe the $100-something range. The bottom line of this story? We must examine our time and really decide what it’s worth.

In examining our time, we can also consider other factors. For example, what are other people charging for the same work? As we can see from this Google search here, there are companies that boast a $1/minute transcription fee. So if we do an independent assessment of our time, and we come to the conclusion our time is worth $2/minute, that’s perfect, but just bear in mind that we may lose a couple of customers to the person who is half our price. A potential solution? Split the difference and charge $1.50 per minute.

There’s a lot that goes into economics, buying, selling, demand, supply, and no one blog post could ever impart all of that knowledge on anyone. Even top economists who have devoted their lives to understanding value and money disagree with each other. The best we can do is urge every reporter, where applicable, to look at what they charge, whether charging an agency, lawyer, or outside consumer, and consider how our pricing practices affect all different areas of the field. There’s tons of literature and articles on price matching and how it can help consumers, hurt consumers, help businesses, and hurt businesses, and the cold truth is that it’s up to us to take the time out and learn about these things, because many of us are our own business, and our business rises or falls on our willingness to learn beyond the machine.