MELON SOUR

About

Posts

Projects

Building an App that Sorts Idols

Programming

08/06/2024

programming, idols

Idol Sort

Simping for idols is one of my hobbies from like a decade ago so a while back I decided to make an app that lets you sort them a la facemash. The premise is that you're given two choices from which you pick the 'better' member. The algorithm presents you choices until it has enough information to sort the entire group at which point it shows the full sorted list. The idea itself even in the domain of idols isn't particularly new, there are sites that exist for more famous groups but I wanted one that could easily scale for new groups I liked and also adapt to a change in members. Originally intended to be a small learning project, things turned out to be way more chaotic and interesting under the water.

It also didn't help that I tend to do my personal coding projects in between turns of a children's card game which is why this thing took upwards of a year to do, but it's done and released in all its glory. I present to you, Idol Sort

Image Acquisition

First step of building this thing was to download all the images from the official site of the respective groups. I could've done this manually but underground idol companies often fire give surprise graduations resulting in frequent member changes. A more naive motivation could be that I'd want to update the images when newer outfits are posted. Either ways I needed something more automatic which meant pulling out some good'ol bash scripting.

#!/bin/bash image_urls=($(curl https://2zicon.tokyo/profile/ | grep -E "*img\/pf\/[a-z].*" | awk -F\" '{print$4}')) # get member names from image url image_names=() for url in "${image_urls[@]}"; do result=$(echo "$url" | grep -Po "[a-zA-Z]+(?=-[0-9])") image_names+=("$result") done mkdir -p images # curl member profile images with member name as file name index=0 for url in "${image_urls[@]}"; do filename="${image_names[index]}.jpg" curl -o "images/$filename" "$url" ((index++)) done

All the sites obviously had a unique structure but the general extraction procedure was the same. Regex the format of the profile images, loop through them, then curl each image whilst saving them with the member's names. Then I wrote a small script that tinify'd the images and uploaded them to a S3 bucket.

Insane Sorting Logic

Next step was to write the logic that ran the sorting feature. There are a ton of resources on the internet regarding algorithmically sorting data when the comparison between items is static. I.e. When sorting numbers from high to low, there's no uncertainty when comparing two numbers, 5 is always going to come ahead if compared to 3.

However in my case, the 'value' of each item is determined in realtime by the user so the algorithm has to adapt and sort new combinations based off the previous picks. I scoured the internet for anything related to this and arrived at this subjective sort app which does just the thing for a custom set of items. The url gave me a hint towards the repo which contained the logic in its entirety, perfect.

The problem was grokking this insane algorithm in order to it for usage in a react app. The process was a lot of black-box trial and error which resulted in a healthy dose of useEffects, event handlers, edge-case branching and of course, recursion. This took months of hacking (I'll spare you the eldrich amalgamation I've produced) but in the end it was done, the minimum viable version of the app was complete on my local machine. Cue a half year break in development because I definitely deserved it.

Bombastic Ranking Algo

Edit 15/06/2024: Due to EC2 prices I've removed the ranking feature for now :(

I wanted to add a bit more functionality to differentiate this app from the existing ones so I added some social features that would make it fun to play with. People want to see how their favorite members compare to others so I decided to add a page that showed the average rankings for member. This in itself didn't seem too hard at first, I thought it'd just take a simple SQL statement to calculate the total score divided by number of user submissions. However this failed to account the aforementioned fact that idols are unfortunately easily replaced.

Say for example a group with members ABC had a list submitted with the rankings 123.

ABC
123

After that C gets terminated and someone submits another list.

ABC
123
21NULL

At first glance the average placement across two data points seems like it'd be 1.5 for both A and B, whilst C stays at 3. However this isn't accurate because the weight for placements in each list is different. Getting 1st place in a group size of 2 should be worth less than getting 1st place in a group size of 3. Therefor I had to scale the placements depending on the number of 'competing' participants.

To calculate this I first calculated the average placement for any row which can be done by adding all non-null values divided by the max max non-null value.

ABCAverage Placement
1236/3 = 2
21NULL3/2 = 1.5

The maximum average placement (which in this case is 2) will be used as the baseline for adjusting all other rows. To calculate the scaling factor for other rows, I simply divided 2 by the average placement of that row.

ABCA.PScaling Factor
1236/3 = 22/2 = 1
21NULL3/2 = 1.52/1.5 = 1.333

Finally the scaling factor is multiplied against the original rankings to create a ranking that's properly scaled against every other record.

ABC
123
2.6661.333NULL

Here we see that getting 1st place out of two people is valued the same as getting 1.333 place out of three people. This algorithm allows new and old members alike to receive a fair result regardless of the size of the group or number of records they've previously received.

The somewhat complex manipulation however, means that I had to dynamically generate the SQL string with Javascript since Prisma didn't provide an easy API for this. A small hint of the amalgamation looked something like this.

const largestPlacement = memberJson[group] .reduce((string, member) => { return string + `COALESCE(${member.name}, 0),` }, 'GREATEST(') .slice(0, -1) + ')' const fromCompetitiveScaleFromGroup = `) as competitive_scale FROM ${formatGroupName(group)}` const suffix = `) as multiplied inner join ${formatGroupName( group )} where multiplied.id = ${formatGroupName(group)}.id ) as average;`

Yeah it wasn't pretty. It took months (of procrastination)

Assortment of Infrastructure

I've intentionally over-engineered this project a bit to gain some learning experience in infrastructure which knew I was weak at, specifically Docker and AWS, which resulted in the architecture below.

The node container serves a Next.js project which uses Prisma to connect with the database. By hacking together this thing I've learnt things like

  • Setting up and developing in Docker containers
  • Making and routing a Cloudfront distribution with different origin and cache settings
  • Automating the upkeep of the application with things like an elastic IP and PM2

Overall it was fun, but there was no way I'd have managed to string together the spaghetti that is the backend of this thing if it weren't for my innate interest in the domain. My recommendation for maintaining motivation in side-projects until completion would be to pick an application that'd you'd personally want to use. There were a lot of small features like auto-composing a tweet based off your list or deterring multiple votes with cookies but I think I'll be closing this write up around here. The end.