Identify characteristics of individual Soldiers that contribute to unit performance
Key to extending work to unit performance without requiring unit-level measurement
Document analysis is a โform of qualitative research in which documents are interpreted by the researcher to give voice and meaning around an assessment topic" (Bowen, 2009)
Our Approach: A Mixed Method Thematic Approach
Natural language processing (NLP) "strives to build machines that understand and respond to text data much the same way humans do" (IBM, 2022)
All NLP for this project was run on a corpus composed of the 10 documents on which document analysis was conducted. The documents were clean by removing stop words (i.e. the, and, of) and legitimatizing each word. Lemmatizing transforms each word to a common base word.
Cooccurance visualizes word pairs that occur together most frequently in the corpus.
TF-IDF is a text classification method that reflects the importance of a word to a document in a corpus of documents.
LDA is a Bayesian topic model that discovers the topics in a corpus of documents and the concurrent probability that they will occur.