Darren Xu Blog

Scraping and deploying a website with Astro and Github actions


What am I making

Some of my friends play in a futsal competition which uses a generic sports CMS to display results, scores and fixtures as well as player statistics and match data. This website has a horrible user experience for mobile as the tables weren’t responsive. As well as that, you had to select the competition every single time as there was no URL for a pre selected competition. I wanted to create a simple, fast mobile site that could show the current table, recent results and upcoming fixtures.

Front end technology

I decided to use Astro as the front end technology as I wanted to try the tech out on a full project. Originally I wanted to use the hydration features but there were some issues with this I will explain later.

Getting the data

To get the data I need from the website I expected to use Playwright to manually visit the URL, change to the correct tab by selecting an item from a dropdown menu and scrape the resulting table. However, surprisingly after I selected the competition the whole data object for the table was logged in the console. I used Playwright’s built in console event listener to collect this data and populate the relevant tables.

Combining the data

Originally I implemented the Astro app with static data but on the deployed site I want the website to do an API call to run a scrape that fetches the latest data. There were a few problems with this.

To solve this problem I decided to just run the scrape on the Astro build time which would mean the data would not be fetched fresh on each page visit, but only on every deploy. I figured this was good enough as the matches only happen once a week so for most of the time the tables will be up to date.

I had to use this code in the Github actions yml file to save the result of the scrape into a env variable as the JSON response caused issues with the bash output.

echo 'JSON_RESPONSE<<EOF' >> $GITHUB_ENV
echo $OUTPUT >> $GITHUB_ENV
echo 'EOF' >> $GITHUB_ENV