Skip to main content
Version: V2

Data Infrastructure

Developers who use CyberConnect to store users’ social data should be able to safely rely on the protocol to write and update millions of records. At the same time, developers need to be assured that CyberConnect doesn’t become a single point of failure for malicious attackers. Therefore, we need a sufficiently decentralized data storage system that is performant and provides guarantees of data sovereignty, availability, and integrity.

Data Sovereignty

To achieve data sovereignty, every connection and content on CyberConnect has to be signed by a cryptographic key pair, meaning that only the person with the private key could have produced such connection and content. This mechanism is designed in a manner that is both easy to use and future-proof.

When a user interacts with CyberConnect through a dApp for the first time, they would create a key pair on the device and publish the public key to the CyberConnect Social Data Network. We support a variety of Elliptic Curve Digital Signature Algorithms (ECDSA) for compatibility. As a user initiates an action, the previously generated private key will be loaded from the local environment to sign the message.

The following section will explain how nodes in the network run the job of storing these signed data. It is worth noting that at the same time, the Social Data Network is compatible with Ceramic’s DID design to enable some data to be easily stored on Ceramic.

User Profile

Each user should have a dedicated space for storing relevant social profile information. We adopted Ceramic’s BasicProfile schema as a starting point and added the CyberProfile schema for users to append arbitrary content blocks. The following is a pseudo schema in Typescript. All raw images for background and avatar are stored on IPFS or other decentralized data stores.

type Profile = {
/** Profile background picture, hash of IPFS address. */
backgroundPicture: string,
/** Bio, up to 1,000 characters. */
bio: string,
/** Blocks belonging to the profile, ordered. */
blocks: Array<Block>,
/** Display name of the profile, standard: 1-20 characters; letters, numbers, and blanks only. */
displayName: string,
/** Handle of the profile. */
handle: string,
/** Profile avatar picture, hash of IPFS address. */
profilePicture: string,
/** Twitter verification info of the profile. */
twitterVerification: TwitterVerification,
};

Idempotent Proof of Connection and Content

Social platforms like Twitter have millions of new connections and posts generated by users every single day. Therefore, the data standard has to be efficient and good to work in at the same time to support scaling.

Take connection data as an example, a rather simple way of doing this would be to put all your following in a list and append the address to the list when a user follows someone new and remove the address from the list when an unfollow action takes place. However, this approach will require the data store and compute engine to provide an ACID guarantee to work in multi-thread scenarios like following multiple people or following and unfollowing in a short time span. Furthermore, the ACID guarantee does not come cheap in a decentralized setting.

Therefore, we adopt idempotent proof to describe the most up-to-date state for connections between any two addresses and content created by users. The benefit of using idempotency is for easy conflict resolution without an ACID guarantee. Instead of using a data model like “following list” where each new following address gets appended to the same list, we describe the social connections as each individual pair of addresses' final desired state (following or not) at that timestamp.

There could only exist one state per any two addresses per operation, e.g. Alice could either only be following or not following Bob. The proof connection includes the following details of the desired state between an originator and a target address (The content proof has a similar structure):

type Proof = {
/** Name of the operation. E.g, Follow */
name: string;
/** State of the operation, E.g, true or false for following or not */
state: boolean;
/** The originator address */
from: string;
/** The target address */
to: string;
/** Which dApp is this connection originated */
namespace: string;
/** Which network is this connection on */
network: string;
/** timestamp */
timestamp: number;
}

We only describe these data standard in the raw object format. However, the final message being sent to CyberConnect Social Data Network would be encoded with both a digest of the message and a signature signed by the owner. We leave this signing process to be generic so that it’s compatible with Ceramic’s IPLD encoding with DAG-JOSE or with a simpler JSON Blob format.

Data Integrity & Availability

For a decentralized data network, we must ensure data availability (that data cannot be censored) and data integrity (that data cannot be modified in an unauthorized manner). The CyberConnect protocol adopts a hybrid model for different data use cases depending on the write frequency and amount of data.

For the user profile data model, we directly use Ceramic as a data store that already provides data availability through its IPFS compatible file structures and data integrity through anchoring a Merkle root of the new file updates on a blockchain. Since user profile data is less frequently written and the number of files in existence is linearly proportional to the existing user base, Ceramic is the perfect solution here.

For the idempotent proof, since users collectively would create millions of records on a social platform, it would be impossible to store everything on Ceramic at its current scale while also providing the high level of service-level agreement (SLA) to dApp developers.

Thus, we designed a mechanism for safely dealing with writes and updates to connections between users using a hash-linked list and a decentralized data store. Each address pair (for connection proof) and an operation would create a new hashed link list called ‘operation log’ upon the first transaction. Each update to the state (changing from following to unfollowing) would append a new node with the new state to the tail of the operation log. Each new state is stored locally on a central server until a batch upload logic uploads the tail of each operation log to a decentralized file storage system like Arweave or IPFS.

Every node in that operation log is stored locally on a central server for caching and serving fast queries. Users could verify data integrity by getting the latest connection state between any two addresses and traverse through the hashed link list by requesting the previous node from the central server. We assume a trusted central server here since the data availability requires the central node to respond to these data queries. We believe this is a sufficiently decentralized and highly performant system.

linkedlist

Data Privacy

We are also continuously doing research on providing privacy-preserving social data networks. Our solution in the CyberConnect protocol may most probably be a combination of the following two approaches.

Encrypted Data

Users should be able to control third-party access to their social data without having to give up ownership or revealing said data to anyone who is not its intended recipient. On the other hand, third-party developers should be able to decrypt that data upon reasonably meeting a set of permissions or lose their access if they no longer fulfill the required eligibility. Lit Protocol’s solution with Access Control Lists has gained considerable support in this regard and the CyberConnect team has learned a lot about their strengths and weaknesses.

Zero Knowledge Proof

A zero-knowledge proof is a method by which one party (the prover) can prove to another party (the verifier) that a given statement is true without revealing any information about the said statement at all. In a social context, Bob should be able to prove to a third-party social application requiring access to his friends’ list that he is indeed connected to Alice without sharing any other information about how, where, when that connection was made or what other things they have in common.

Designed by