DocRank: Computer Science Capstone Project with the College of Charleston

Throughout the 2017 spring semester, PokitDok sponsored a capstone project through the Computer Science Department at the College of Charleston. PokitDok provided a curated dataset of 5+ million providers, a project concept, and mentorship for three CofC senior students. We challenged the students to build a full-stack UX to enable search, ranking, and intuitive design to view providers within the dataset. The devil is in the details and the CofC students delivered a solid solution.

We would like to introduce you to the three members of the class of 2017 who took on this challenge: Ryan Lile (Computer Science), Kaya Tollas (Data Science/ Mathematics), and Sarah Wiegreffe (Data Science). This team created and built a product called DocRank, a mathematically empowered web app that recommends doctors to users according to a variety of input parameters.


Amidst the renowned CofC graduation festivities, we caught up with the trio to gain insight into why they chose this project and their overall experience.



Why did you select the PokitDok DocRank Project?

Kaya Tollas: I selected PokitDok's DocRank project for a variety of reasons. My mother is a doctor, so I have always been interested in helping to solve the many inefficiencies surrounding healthcare. I also wanted to gain experience in the data acquisition and cleaning process as well as the process of designing and justifying a meaningful ranking algorithm.

Ryan Lile: I was excited to learn more about data science and ranking algorithms. It seemed like a great project to apply what I had already learned from school and also learn a lot about some new technologies.

Sarah Wiegreffe: I wanted to be part of a project with a significant data science component and one which would allow me the chance to learn something new that was a bit outside the realm of my coursework.


Which technical challenges were you most looking forward to tackling with this project and why?

Ryan: I most looked forward to connecting the three pieces (database, server, front end). I had individually done each of these but had not yet built the whole stack from scratch.

Sarah: The matrix algebra involved in calculating the rankings for the doctors, because I had never worked in this field before and got to read and learn a lot about the math involved in this unsupervised learning task.

Kaya: I was looking forward to learning how to scrape and parse HTML data since I had only previously worked with pre-processed and formatted data.


How did your expectations change as you advanced through the project?

Sarah: We realized that the sparsity of the publicly-available data we could find as well as the sheer size of our dataset would make the ranking calculation, especially the matrix inversion used to generate this, orders of magnitude more computationally expensive and we had to change our plans accordingly.

Kaya: I had never worked with such a large dataset, so the biggest adjustment I had to make was accounting for the time and computing power necessary to apply changes to the data.

Ryan: Initially I was hoping to just have a working front end that was very simple and basic. Through the project, as I learned React, I was excited to actually build a more robust front end and user experience. We were hoping to do more with natural language processing to enhance the user experience but we were unable to focus on that.


What were the top 3 most valuable experiences that you gained by working with PokitDok for this semester project?

Kaya: My most valuable experiences this semester were learning HTML scraping skills, encountering and learning to overcome the complexity and challenges of real-world data (veracity, inaccessibility, difficult formats, etc), and engaging in the process of identifying and justifying which data are relevant for a meaningful ranking algorithm.

Sarah: Planning and coordinating as a team, consulting with Dr. Amy Langville on the mathematics, and presenting the project to an audience.

Ryan: Working a real world project from requirements to complete. Learning a lot of new technologies in order to accomplish our project. Learning to work with a team and communicate effectively.



Would you recommend this experience to another student? Why?

Ryan: Absolutely! Being able to dive into a real-world project and figure it all out on your own was a blast. In school, you get a lot of very specific problems that have a single domain or use. Having the opportunity to build out more robust problems ties all of your classes together and provides a deeper understanding of how things work.

Kaya: I would absolutely recommend this experience to another student. This project was an opportunity to apply skills I've learned in my classes as well as encounter challenges that I had never faced in a classroom setting. It was incredibly valuable to practice working with a team to design and execute a project, especially as I prepare for my highly team-oriented graduate program.

Sarah: Yes! I've enjoyed the chance to learn about something completely different from what I've done in the past- namely rating and ranking systems. Also working with real data in large quantities taught me a lot about the challenges of large-scale computation.


Part of the group's challenge was to balance project management positions while tackling the various technical challenges for this type of project. While each student rotated through the role of scrum master, the DocRank team split up the project pipeline into areas which best suited their technical strengths. Kaya drove the task of acquiring, matching, and merging additional relevant data to augment, normalize, and prepare the data set for the ranking algorithm. Sarah designed and implemented a ranking algorithm that implemented Colley's Algorithm over a feature matrix to score providers via matrix inversions, much like one would rank basketball teams. Ryan built the web app that was backed by Mongo DB and served data up to a react UI via an API in Node.

To get access to their code, notebooks, and discussions, check out:

We thoroughly enjoyed working with Ryan, Sarah, and Kaya on this project. We were impressed by their unique approach to the ranking problem and to the full stack solution they were able to build in such a short time. We are looking forward to the next round of capstone projects with CofC next year.