"Googlizing" military intelligence searches: The next frontier for sifting through all that UAV (and other) data
StoryJuly 27, 2012
Saying that military intelligence analysts have their work cut out for them these days is really an understatement. Sure, the explosion of new UAV sensors and other information-gathering technologies is guaranteed to enable capture of exponentially more Intelligence, Surveillance, and Reconnaissance (ISR) data, but someone -?or something -?needs to make heads or tails of the resulting avalanche of data.
Saying that military intelligence analysts have their work cut out for them these days is really an understatement. Sure, the explosion of new UAV sensors and other information-gathering technologies is guaranteed to enable capture of exponentially more Intelligence, Surveillance, and Reconnaissance (ISR) data, but someone - or something - needs to make heads or tails of the resulting avalanche of data, whether it's full-motion video, Intelligence Information Reports (IIRs), or myriad other structured or unstructured data.
However, instead of a human analyst slogging through (maybe) 10 three-page documents in an hour, Modus Operandi has developed its military intelligence “Googlizing” system, which has been “trained” via machine learning and enabled by a semantic analysis engine to provide the military intelligence community a way to quickly conduct military intelligence searches – as simply as one would do an ordinary Google search on a home computer. The difference (besides the content and data sources, of course) is the interfacing with the military’s DCGS Integration Backbone (DIB). Managing Editor Sharon Hess recently caught up with Tony Barrett and Mark Wallace of Modus Operandi to get a behind-the-scenes look at this technology. Edited excerpts follow.
BARRETT: Modus Operandi is a software technology company focused on rapidly solving complex intelligence information discovery, integration, and fusion problems for defense and intelligence customers. Modus Operandi has offices in Melbourne, Florida, and Aberdeen, Maryland, and employs fewer than 100 employees.
I received some information from Modus Operandi about a technology described as “Googlizing military intelligence.” So what is this program or technology, and what is it called?
BARRETT: Internally I think we’re calling it Blade, but a generic technology description is: a Wiki-based semantic engine that handles intelligence data, in the interest of shortening the intelligence analyst research track.
The background says the “Googlizing” capability is enabled by a semantic engine, as you mentioned, and machine learning.
BARRETT: Yes. We are adding semantics to structured and unstructured data. That makes the data smarter as it enters the intelligence flow, so to speak. We come up with dictionaries and vocabularies that train the computer how to recognize words, ideas, and concepts that people intuitively know, so that as intelligence data is fed into the system, it automatically either correlates or corroborates intelligence to help the analyst figure out what’s important and what’s not important.
WALLACE: Also, Googlizing involves going through lots and lots of intelligence data, making a model of what [the semantics engine] has seen. Then when people search with it, they can get to that data quickly. We can crawl through lots of data in advance; people can be sending [Blade] video data saying, “Here is what we have found.” They are crawling video data and giving their results to [our system]. Other systems or our own software might be crawling through documents and figuring out what we’re storing and understanding, what we need to put in our model. Then when someone comes to the Wiki or does a search on the DIB [DCGS Integration Backbone] for something – those are two ways to get at the information. And then once they click on a link, they are in a model like you would see in Wikipedia, where you have a page on something and you have links to other pages.
How does the semantic analysis differ from the machine learning?
WALLACE: The semantic analysis involves how you go through and identify what the different words are in the text, for example, and what they mean and whether they are of interest: “Is this representing an event that we care about or a person we care about?” Machine learning is a little different from that; it’s really just feeding back if it turns out [the system] missed something – maybe we missed the fact that this is an alternative name for this person. Then when a human reviews the document later, they’ll say, “Yeah this is an event about this guy that was missed.” It’s a way that the machine learns, “Oh, well this is maybe an alias of that guy” or “This is another way to describe an IED emplacement event,” for example.
Why is this military intelligence technology needed? How many hours of intelligence are there to power through?
BARRETT: The bottom line is most intelligence analysts are drowning under a wave of data. Over the past 10, 15 years, the intelligence community has done just an absolutely magnificent job of acquiring new sensors, which has created an onslaught of data that is crushing intelligence analysts. So what we are trying to do is take that data being fed to them from all of these sensors – whether it’s full-motion video or text, whether it’s structured or unstructured – [and make it accessible to operators]. For a human brain to be able to read, internalize, and then make decisions off of all of that data is extremely challenging, especially when, for example, 90 percent of what you read might be chaff and you are trying to find the golden nuggets.
WALLACE: Yeah, how many three-page documents can an analyst go through in an hour? Five to 10 maybe. With the automation, you’re talking hundreds or thousands of documents per hour, so that’s where applying the computer helps the analyst avoid overload.
Does a lot of the data that your technology uses come from UAVs?
BARRETT: Yeah. We have partnered through the Office of Naval Research with a company called Defense Analytics Corporation or DAC. They have produced a technology that has the ability to stare at full-motion video that comes out of UAVs and be able to identify those things that are most important to the analyst. Say, for example, [their technology was] looking for a white pickup truck and they’ve got their machine staring at a full-motion video feed coming from a UAV and it notices a white pickup truck. What that will do is identify that portion of the video where the white pickup truck is visible, tag it with an indicator, send it to our system, and provide an alert to the analyst as well as a link to the source video in facilitating that analyst’s ability to go straight back to that video and take a look at it.
With which types of sensors or data does your system work?
BARRETT: Whichever sensors are out there. So if you want to talk Predator, you want to talk Rover, Raven, we want to have the ability to ingest that intelligence into the system. That is also ground-station agnostic, which is important because we don’t want to get tied up with whatever particular proprietary technology the ground station is. We’re more interested in the data and how that feeds into the overall intelligence enterprise, rather than just specific data types.
However, our system loves structured data, like images that come in, pretagged with metadata off the sensor; it loves that because it is pretagged, it’s got very good data, it’s got very structured information that is coming into the system. But the true power behind our system is in the unstructured data and being able to correlate that structured data with the unstructured stuff. And when I say “unstructured stuff,” I am talking about open-source intelligence.
The other thing is, human intelligence reported in Intelligence Information Reports (IIRs) is highly unstructured, yet it describes a meeting that someone had. We would want to be able to ingest that and make the machine understand not just the spirit but also the intent behind the report, what it was saying, and how valuable that information could be. So what we’re trying to do is cover “270 degrees of the intelligence cycle,” if you will. The only thing we are not doing is directing collection assets and what they are doing.
How fast does this system work?
BARRETT: That’s what is being born out right now through the program [we’re working on with] the Office of Naval Research. The problem is that there is a trade-off among time, storage capability, and currency of data. And what we’re trying to do is figure out where the trade-offs are because the bottom line is that most searches are returned in milliseconds, no matter what you’re doing.
How does intelligence information feed into your system?
WALLACE: As I mentioned earlier, our technology builds a model, so that is based on the work that is happening in advance of the operator doing a search, as long as we have the pipes, so to speak, to get that data in. We can get the data in by watching a folder, by having other Services feed us data through a network connection. Then when an operator wants to find something, typing in some keywords might get them to their starting point. Once they get that initial hit, then it is kind of like a Wikipedia.
Is the “model” relative to a certain topic or individual, or is it a display mechanism for any search?
BARRETT: If they are hunting a person, for example, they have a very concrete idea of what information they need going into that problem set and what they should produce coming out of that problem set in order to allow the prosecution of that mission. What we are doing is integrating that into the very fabric of the product that we are building, so that the algorithm and the technology that we build reflect precisely what they have told us they need in order to produce intelligence products.
Your original press release says the system displays the results in some kind of a Google Earth format?
WALLACE: Yes. The idea is that a lot of map capabilities use common formats, which means the same data can be rendered on Google Earth and can be rendered in a Yahoo map or certain other military open-source mapping capabilities. So in the Wiki, there’s a little area of the page where you could show different events related to a person in a table or plot them on a map.
You mentioned DCGS earlier. How does your system interface with that?
BARRETT: Our focus right now is getting it into the DCGS Enterprise in the interest of wider federation across all Services. Modus Operandi is a Small Business Innovative Research (SBIR) company, and what we are doing right now is transitioning this technology from the research lab into the hands of the operator.
Are any of the Services using this system right now?
BARRETT: No. We are in the middle of testing right now. We are working very heavily with the Marine Corps. The Marine Corps work out of something called a MAGTF or Marine Air-Ground Task Force. What that means is that they have all of the pieces and parts required to execute a battle, all within their own control. That means they have the air, the ground, the command structure, the logistics support. So what we’re doing is working on transitioning this technology to a Marine unit, which would allow them to correlate and corroborate all of their intelligence that they are receiving from both the national level as well as their organic assets in order to prosecute a mission they have been assigned.
Is there a date for when testing of your system is going to be done?
BARRETT: The decision-making process for that is ongoing. I can’t give you a solid answer because I know target dates and not exact dates at this point, but it looks like 2013.
[As an aside], we have been talking about the full system package [Blade]. However, one of the pieces of technology that feeds into [Blade] is something called a Virtual MetaData Catalog [VMDC], which allows users to hook into the DCGS Enterprise and share data across all Services as well as the intelligence community. So that [component of Blade] is being used by one of our customers at Fort Meade. What we’re trying to do is make that part and parcel to the full [Blade package], which includes the Wiki front end, the wave semantic engine that underlies that, as well as the VMDC.
So in essence, there are multiple parts to the full system technology we’ve been describing, i.e., what we’re working on right now is taking the full package and putting that together within the concept of operations of how the Marine Corps sees their tactical operations running. That’s what we are trying to get into the hands of the operator right now.
Referring to the full package system we have discussed during this interview, what are the limits or challenges of this technology?
BARRETT: I think a lot of the challenges pertain to predictive analysis. What you want to be able to do – the whole point behind intelligence – is to make a guess or determination what the enemy is going to do. I think the next stage beyond getting the relevant information to the analyst is to then be able to determine “What does that mean” and “What does that predict that the enemy is going to be able to do?” So I think predictive analysis is the next frontier.
You mentioned the Office of Naval Research earlier. I assume they are providing funding for Blade?
BARRETT: Everything that we produce is GOTS – Government Off-the-Shelf – technology. They are paying for it. The contract vehicles that we use to develop this technology mean that the technology is government owned.
Do you have any commercial customers, or would you ever expand this to a commercial realm?
BARRETT: That is not in our vision right now. Right now, considering the challenging time that this country is in, we’re trying to be able to provide the best product to the government based on their R&D investment.
Tony Barrett, a retired U.S. Marine Intelligence Officer, is the Senior Manager of Intelligence, Surveillance, and Reconnaissance (ISR) Technology Integration for Modus Operandi. He has more than 23 years of experience in the Defense Department as an intelligence professional and received numerous awards for activities both deployed in combat as well as garrison environments. He is a recognized expert in U.S. intelligence operations and ISR data handling protocols and procedures.
Mark Wallace is the Principal Engineer for Semantic Applications at Modus Operandi. He has more than 25 years of experience in software development and 15 years of experience as lead architect on software projects for the DoD and private industry. Prior to rejoining Modus Operandi in 2009, Mark served as chief architect and ontologist at 3 Sigma Research. He also served previously at Modus Operandi as Director of Product Development for the Wave product.
Modus Operandi 321-473-1400 [email protected] www.modusoperandi.com