Technical Architecture
Last updated
Last updated
The first step of a search engine is to know where data is and fetch data into it. The goal of this layer is to get all Web3-related data regardless of the original storage place, such as Ethereum's transaction data, Lens posts stored on Arweave decentralized storage, Twitter data on centralized servers, user data in browsers, etc.
The parser analyzes the original data structure, so the engine can create more signals to support the search. Each single data point is called a page. A page can be an article, a social post, transaction data, or metadata of an NFT. Web2 search engines are extracting unstructured information, for example, to split texts into words. That is why the development of knowledge graphs and web structure is stuck in the Web2 era. Comparatively, Adot, as a Web3 search engine, recognizes the complicated structure of each page. A structured data system not only makes it easier to build a knowledge graph and improve the quality of search results but also assists to clarify the owner of each data and therefore rewards data providers with a solid tokenomics mechanism.
Each page passes a page structure transformer. The page's original format will then be transformed into a unified structure, which tells the detailed structures and filters important information, e.g., the title of an article, the owner of an NFT, followers of a user, and which parts should be ignored.
A cross-page relation is constructed by the knowledge graph builder. A cross-page relation is determined through both in-page structures and cross-page linkages. For example, a mirror article points to an etherscan webpage if the article contains a hyperlink to etherscan, and two wallet addresses are linked together if they belong to the same user. These relations are useful to mine the underlying knowledge of the whole Web3 domain and eventually build a smart search engine.
This layer uses original data and parsed content to produce deep signals. NLP technologies are used to generate human-readable and machine-readable labels, and other AI technologies are to annotate non-text content. Extra ranking signals computed will further improve the precision of the search algorithm. Usersβ search histories are also processed to generate the labels and embeddings of users.
The engine follows an index protocol to build its index using outputs of content parser and modeling. The index is the key of each search engine to search through billions of pages and return relevant results to users in milliseconds. Please see the inverted index. Adot is using ElasticSearch (ES) to create basic indexes.
However, the traditional indexing process breaks content as unstructured text, i.e., a bag of words, instead of structured knowledge. We, therefore, improve the ES index to integrate page structures and knowledge graphs. The structured index makes it possible to handle complicated structures of search queries and rank search results more precisely with richer information.
When a user inputs a query, the search engine recalls relevant pages from the index and uses the search algorithms to aggregate the pages into a list of results for the user to review.
The query processor uses NLP technologies to analyze both explicit keywords and implicit context of the query and thus fully understand what users are looking for. Structured queries also allow users and developers to describe their needs more precisely.
The technology of multi-layer ranking uses more signals to rank the results than a traditional search engine. Text, page structures, graph relationships, and complex signals are combined to rank results into a list of relevant pages. Personalization and diversification modules customize results to fit either a universal scenario or a specific use case.
Search results can support various dApps. For example, explicit search results for a search engine or search boxes in a social dApp, or implicit search results for a recommendation engine or a data analysis platform.
When a user uses a supported dApp, the user sends either an explicit query, e.g., search keywords in a search engine, or an implicit query, e.g., click on a marketplace page. The dApp sends the query together with the context to the engine. After receiving the results, it uses its own UI to present the data to the user.