Automated Detection of Terrorism Press Releases
Economic Analytics, Spring '22 (University of Arkansas)
Career competency: technology
​
Context and background that is important or relevant to my audience is the ability to prepare and manage large datasets and executing an effective empirical approach. Employers will want to see that I can both turn raw data into usable data and develop accurate machine learning models in key languages (Python, R, etc.)
For this artifact, I developed a project proposal, scraped text files from the DOJ website, optimized a RandomForestClassifier, and presented the results at a Master's poster session. The objective of our project was to create a train model that could presumably be used by the UArk Terrorism Research Center (TRC) on campus to automate the classification process of press releases. Currently, the center recruits interns to manually sort all DOJ press releases as "terrorism" or "nonterrorism" for their ATS database, which is incredibly slow and labor intensive. My team regularly consulted Dr. Hyunseok Jung, our course professor, to ensure we were on track. The result is a proposal, poster, and several optimized supervised ML models (multinomial Bayes, SVC, and RandomForest).
Throughout this project, I learned a lot about preparing raw datasets to be run through defined models. This was my first time working with text data and text classification, which proved a bit of a learning curve (though an exciting challenge). I learned to use packages like BeautifulSoup to generate rich datasets (over 120k documents) to train and test my models. I appreciated the intuition I had previously gained in statistics and economics courses, which eased the process a lot. I was most proud of my own and my teammates persistence in debugging the model scripts. There were a lot of Google searches and a few late night Zoom calls, but at the end of the day we managed to get everything to work and create the visuals we needed for the poster.
As a result of this project, we have a proposed model that can hopefully be applied to accelerate the important research conducted by the TRC. I hope to use the skills I learned to continue diving into research to tease out causality and trends in more real world economic applications. I would be interested in learning more about unsupervised ML (such as deep learning) and further improving my proficiency in the languages used in this project, particularly Python.