7. What is the keyword search and data mining phase?
This phase is where it all comes together. It is also the most time consuming and costly component of the process. The client will work with our computer forensics team to produce a search term - extraction list. This will be our guide during the process. This list will then be applied to each image / index - catalogue. The following items should provide some light on this process:
Each image / index - catalogue is processed separate from all other images. This means that a 10 term search list being applied to 10images is the equivalent to applying a 100 term search to 1 image. Also keep in mind that there will be some overhead in booting up the 10 drives vs. the time it takes to boot up 1 drive.
Search terms need to be specific and as relevant as possible. The reason for this is that the search process will apply the specific term to every file and/or space on the image. Consequently, generic or abbreviated terms will result in a large number of false hits.
A hit can be defined as the search term appearance in either a file or file remnant. In the past, we have seen hits for a single search term to exceed 1 million and we have seen the file count for a single search term to exceed 10 thousand.
Every hit (false or otherwise) will need to be reviewed and have its relevance determined. In other words, the fewer false hits - the better.
In general, one image, having approximately 10 - 15 terms against it, can be searched and the results reviewed in about 12 - 16 hours.