Searching Voice Mail for E-discovery Can be Problematic

Unified communications is the term used for integrating all communications - data and voice - over the Internet: data in its myriad forms, and voice as VOIP (Voice Over Internet Protocol) and .wav files. Such integrated communications can save money (for instance, doing away with long distance charges when using VOIP) from operating budgets. Savings like these accrue to the 26% of businesses that have adopted them. But when litigation demands discoverable data, .wav and voice files can be difficult and costly for a computer forensics expert to search.

There are many tools designed for searching text files, and even for text from deleted files. These range from computer forensic and e-discovery suites costing thousands of dollars to open source tools, including hex editors that cost the user nothing at all. There are also many effective tools for scanning paper documents into text files. which are then searchable.

But when it comes to audio, no such level of accuracy or ease yet exists for the purpose of searching for specific information. There are currently three means of searching audio: phonetic search, transcribing by hand, and automatic transcription.

Phonetic search matches wave patterns to a library of known wave patterns. Between various modes of speaking, including accents and dialects, the accuracy of this method is spotty. It produces many false hits. And while it may identify sections and phrases that are of interest, it doesn't transcribe the audio into text - the audio must then be listened to.

Performing manual transcription, so that text can then be automatically searched, is time-consuming. As it also depends upon a listener to then type the words heard, it can also be very expensive. There may be security concerns, as the audio goes outside the company (or perhaps the country) to be transcribed.

Machine transcription is the one automated means of converting audio to text. But it suffers from accuracy issues. It compares "heard" audio with known libraries, again facing issues of differing pronunciations, terms not in existing libraries, and clarity of recording. While high-quality recordings can lend themselves to recognition rates of 85% or so (a positive-looking number until compared with the nearly 100% accuracy of pure text searches), when dealing with voice mail, accuracy dips down as low as 40%.

While requirements for retention of data increases, and storage costs go down, identifying what audio should be kept and what should be deleted can be costly. As such information is digitized, it must nonetheless be stored and indexed (or searched after the fact). The technology is not mature, and is evolving. In the meanwhile, companies face a difficult issue in deciding what stays and what goes.

Read more about this issue in Switch to unified communications brings major e-discovery headache by Joe Vanden Plas
 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this post.
Comments
Page: 1 of 1
  • 6/8/2008 3:17 PM Benjamin Wright wrote:
    Steve: I argue that voice signature can help to preserve a voicemail audio file. What do you think? 
    --Ben 

    Reply to this
    1. 6/9/2008 1:35 PM Steve B wrote:
      First, may I say that I love the name of your blog? Hack-igations. Too great!

      I infer that you are arguing for your assertion and not against the information in the blog entry. 

      I would say that recording onto your computer the recording that had been on your cell phone was a wise and immediate means of preservation. As I am not an attorney, that isn't legal advice on my part - it just seems like it was a good move - preserving evidence is a good thing. As I have done data recovery, discovery work and computer forensic analysis on about 15,000 pieces of digital media, I would say that an important next step would be to back that file (and all your important files) up to another medium.

      The "My Electronic Evidence" service you reference seems like a good idea as well.


      Reply to this
  • 6/9/2008 7:26 PM Benjamin Wright wrote:
    Steve: Glad the blog name appeals!

    Your said: >>I infer that you are arguing for your assertion and not against the information in the blog entry. <<

    Correct. --Ben
    Reply to this

Page: 1 of 1
Leave a comment

Submitted comments are subject to moderation before being displayed.

 Name (required)

 Email (will not be published) (required)

 Website

Your comment is 0 characters limited to 3000 characters.