September 5th, 2022 (Labor Day)
37,-5 (Atlas Norte)
By howieDoin – Business Development and Infrastructure at Atlas Corp
How we doin’? I want to start by thanking the community for participating in our DAO Grant proposal process, especially the 75 of you that voted. It was a pleasure to have sparked such passionate, thoughtful debate about the future of data and analytics in Decentraland in the forums and the discord; we at Atlas CORP get some of our best work done through fierce debate amongst each other to cause the right idea to sometimes pop out unexpectedly so it’s great to see that paradigm applied on a metaverse-wide scale. We’re thrilled to support @Dax’s mission to build dashboards on top of the data source we’re assembling. Dax’s use case will provide real-world user stories for how the data is eventually to be formatted and distributed so we will be working closely throughout our development process.
Our development process typically begins with discovery so the first month has been about exploration. As our infrastructure choices are predicated on the size and growth rate of the data, the first thing we did was to immediately set up data collection pipelines to collect data from <catalyst-url>/comms/islands from all 11 active nodes and save them to a database to monitor its growth. Over the last 30 days we have collected 1,490,198 documents which take up about 7.9 gigabytes on disk which is a bit less than was initially estimated for 20-second granularity. Indexing the timestamp takes up about 75 mb. These figures will help refine our early costing estimation, but it is only one monthly datapoint and data may vary wildly from month to month as large Decentraland events bring more users in-world for longer. This also only considers the current state of the metaverse, and does not consider the inevitable growth rate of daily active users so we’ll continue to monitor the growth of the data to make sure we’re planning for enough future growth. The data we collect now, while not truly production data due to potential quality issues and gaps, will be added to the public data archives we’re planning on setting up.
The reason I’m hesitant to call what we’ve collected to date “production-quality” is due to some of the issues we’ve encountered attempting to collect the data in a consistent and standardized format. The following are some of the issues we, and others, had in interacting with the catalyst API endpoints over the period which will have to be resolved to provide clean data to the community:
· 400-type errors indicative of unauthorized access. After some experimentation and community outreach, adding a User-Agent header fixed the issue. While some consider this best practice, there is no mention of this being necessary in the documentation .
· Several 529 errors (“Too many requests”) returned as an NGINX HTML page. This error is quite curious as our data was captured from each node at consistent 20-second intervals. This would seem to indicate either misuse of the standard error code (there is a different reason access was withheld), or a policy that declares too many requests across all users of the API. In other words, in the latter case one bad actor can disable data collection for everyone, instead of simply throttling the bad actor.
· DNS handshake failures – usually indicative of the node being offline and unable to be found via their url. Given that we trust that these nodes to be up and available for Decentraland users to have a good user experience, it would be good to know details of any SLAs (e.g. uptime requirements) that node operators commit to and to understand who is meeting these conditions.
Much of our actual development will be designing a system to work resiliently to collect this data which we plan to begin next month. We’ll be designing and implementing the architecture to be a robust, production system with features like…
· Redundancy to keep data collection going even if there is an infrastructure outage
· Logging and error reporting, to catch instances when data formats come in looking like anything but what is expected
· Deploying to infrastructure that can keep up with the somewhat unpredictable, future data collection and query needs of the community
As we’re planning to make use of MongoDB Atlas, we’ve been in discussions with the MongoDB developer relations team to understand what avenues are available to bolster the DAO’s investment in this data source. While we’re still early in our conversations, the DAO may be able to (a) receive architectural review and optimization from MongoDB engineering experts, (b) participate in promotion and outreach about the use case and Decentraland to MongoDB’s community, and (c) potentially benefit from internal MongoDB sponsorship programs to stretch the infrastructure budget to go further. More to come on this topic in future updates.
We’ll provide another update next month with our progress. We’re always looking for user stories so if you plan on making use of this data feel free to reach out and we’ll do our best to work with you to incorporate additional user stories into an API.