DocRank: Computer Science Capstone Project with the College of Charleston

Throughout the 2017 spring semester, PokitDok sponsored a capstone project through the Computer Science Department at the College of Charleston. PokitDok provided a curated dataset of 5+ million providers, a project concept, and mentorship for three CofC senior students. We challenged the students to build a full-stack UX to enable search, ranking, and intuitive design to view providers within the dataset. The devil is in the details and the CofC students delivered a solid solution.

We would like to introduce you to the three members of the class of 2017 who took on this challenge: Ryan Lile (Computer Science), Kaya Tollas (Data Science/ Mathematics), and Sarah Wiegreffe (Data Science). This team created and built a product called DocRank, a mathematically empowered web app that recommends doctors to users according to a variety of input parameters.

 

Amidst the renowned CofC graduation festivities, we caught up with the trio to gain insight into why they chose this project and their overall experience.

 

 

Why did you select the PokitDok DocRank Project?

Kaya Tollas: I selected PokitDok's DocRank project for a variety of reasons. My mother is a doctor, so I have always been interested in helping to solve the many inefficiencies surrounding healthcare. I also wanted to gain experience in the data acquisition and cleaning process as well as the process of designing and justifying a meaningful ranking algorithm.

Ryan Lile: I was excited to learn more about data science and ranking algorithms. It seemed like a great project to apply what I had already learned from school and also learn a lot about some new technologies.

Sarah Wiegreffe: I wanted to be part of a project with a significant data science component and one which would allow me the chance to learn something new that was a bit outside the realm of my coursework.

 

Which technical challenges were you most looking forward to tackling with this project and why?

Ryan: I most looked forward to connecting the three pieces (database, server, front end). I had individually done each of these but had not yet built the whole stack from scratch.

Sarah: The matrix algebra involved in calculating the rankings for the doctors, because I had never worked in this field before and got to read and learn a lot about the math involved in this unsupervised learning task.

Kaya: I was looking forward to learning how to scrape and parse HTML data since I had only previously worked with pre-processed and formatted data.

 

How did your expectations change as you advanced through the project?

Sarah: We realized that the sparsity of the publicly-available data we could find as well as the sheer size of our dataset would make the ranking calculation, especially the matrix inversion used to generate this, orders of magnitude more computationally expensive and we had to change our plans accordingly.

Kaya: I had never worked with such a large dataset, so the biggest adjustment I had to make was accounting for the time and computing power necessary to apply changes to the data.

Ryan: Initially I was hoping to just have a working front end that was very simple and basic. Through the project, as I learned React, I was excited to actually build a more robust front end and user experience. We were hoping to do more with natural language processing to enhance the user experience but we were unable to focus on that.

 

What were the top 3 most valuable experiences that you gained by working with PokitDok for this semester project?

Kaya: My most valuable experiences this semester were learning HTML scraping skills, encountering and learning to overcome the complexity and challenges of real-world data (veracity, inaccessibility, difficult formats, etc), and engaging in the process of identifying and justifying which data are relevant for a meaningful ranking algorithm.

Sarah: Planning and coordinating as a team, consulting with Dr. Amy Langville on the mathematics, and presenting the project to an audience.

Ryan: Working a real world project from requirements to complete. Learning a lot of new technologies in order to accomplish our project. Learning to work with a team and communicate effectively.

 

 

Would you recommend this experience to another student? Why?

Ryan: Absolutely! Being able to dive into a real-world project and figure it all out on your own was a blast. In school, you get a lot of very specific problems that have a single domain or use. Having the opportunity to build out more robust problems ties all of your classes together and provides a deeper understanding of how things work.

Kaya: I would absolutely recommend this experience to another student. This project was an opportunity to apply skills I've learned in my classes as well as encounter challenges that I had never faced in a classroom setting. It was incredibly valuable to practice working with a team to design and execute a project, especially as I prepare for my highly team-oriented graduate program.

Sarah: Yes! I've enjoyed the chance to learn about something completely different from what I've done in the past- namely rating and ranking systems. Also working with real data in large quantities taught me a lot about the challenges of large-scale computation.

 

Part of the group's challenge was to balance project management positions while tackling the various technical challenges for this type of project. While each student rotated through the role of scrum master, the DocRank team split up the project pipeline into areas which best suited their technical strengths. Kaya drove the task of acquiring, matching, and merging additional relevant data to augment, normalize, and prepare the data set for the ranking algorithm. Sarah designed and implemented a ranking algorithm that implemented Colley's Algorithm over a feature matrix to score providers via matrix inversions, much like one would rank basketball teams. Ryan built the web app that was backed by Mongo DB and served data up to a react UI via an API in Node.

To get access to their code, notebooks, and discussions, check out:

We thoroughly enjoyed working with Ryan, Sarah, and Kaya on this project. We were impressed by their unique approach to the ranking problem and to the full stack solution they were able to build in such a short time. We are looking forward to the next round of capstone projects with CofC next year.

About Denise Gosnell, PhD

Dr. Denise Gosnell is a Technology Evangelist at PokitDok, specializing in blockchain, machine learning applications of graph analytics, and data science. She joined in 2014 as a Data Scientist, bringing her domain expertise in applied graph theory to extract insight from the trenches of healthcare databases and build products to bring the industry into the 21st century.

Prior to PokitDok, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg’s Lean In Circles.

View All Posts

About Ryan Lile

Ryan Lile, a Charleston native, graduated with a BS in Computer Science in May of 2017 from the College of Charleston. During school he worked on a research lab with the Gettysburg Foundation to redesign their ticketing system on Salesforce and with the College of Charleston Admissions office doing web application development. Ryan also did research during school on data migration through restful APIs. He is very excited to transition into the workforce to learn and grow in his skills and knowledge.

No Posts for this author.

About Kaya Tollas

Kaya Tollas is a data science student beginning a 1-year Masters of Science in Analytics program at the University of San Francisco. She hopes to make a difference as a data scientist by contributing to solutions in sustainability, resource management, and transportation.

No Posts for this author.

About Sarah Wiegreffe

Sarah Wiegreffe is a recent graduate of the College Of Charleston summa cum laude with a bachelor's of science in data science and minors in mathematics and international studies. She will be pursing a doctorate in computer science at Georgia Tech's School of Interactive Computing in the fall. Her interests include natural language processing and machine learning with social computing applications. Outside of computer science, she enjoys traveling (she spent a semester abroad in Estonia during undergraduate and a year in France prior), cooking, and rock music.

No Posts for this author.

Leave a Reply

Your email address will not be published.