World Mapping Assistant

+2  

A computer program-toolkit that helps individuals build world model by scanning networks and P2P crawling internet, leveraging their personal computer's accessibility.

YAML Idea

It could be like a game — you landed in a network of some sort, and your game helps you locate and use world resources for your goals.

Imagine that you buy a computer, and then install this program. The program is a tool (a bit like gamified, bittorrent-powered collaborative information retrieval and actualization tool), not a service. To make the game more fun, the program asks to set your goals and metrics (later on we'll automate resource discovery to support those goals, so those goals would initially be for gamification purposes only), for example a goal may be "find a friend". Then, you run the program. It runs in background, having command line interface, that pops up by clicking on its tray icon or via command.

The program starts to collaboratively scan and map available resources — from your hardware to network. To gather the initial information, like hardware available, using known locations and existing software, like path /proc, nmap, bluetoothctl, nmcli and scripts to machine and LAN info, identifying 3 main addressing spaces: filesystem address space, IP address spaces, MAC address space. The program then leverages on ipinfo.io data to get existing Autonomous Systems information, and how they the IP spaces are allocated via regional Internet registries (RIRs). Then, it uses the ICANN Centralized Zone Data Service to determine the what domains are publicly available on the Internet: domains space. The user is able to see the percentage of IP space covered by domains space, and have the rest of the IP space declared -- unknown.

Thereafter, the program allows to choose any website, and get its structured information, if the information has been crawled by anyone on-line, keeping that data as easily downloadable as a movie via bittorrent. The user themselves is able to see what websites out of the domains space had been crawled by anyone, and if crawlers exists to initiate a new crawl.

The crawled information is stored locally, based on protocol that it has been crawled in, and domain or IP that it was crawled from. For example, if we use MongoDB to store crawled data, then, if it was crawled over HTTPS from 1.1.1.1, at port 443, then the web page would be saved to database named HTTPS, by protocol name:

$ mongo
> use HTTPS
> db['1.1.1.1:443'].insertOne({'hi': 'there'})

In addition, the program would suggest using an ontological vocabulary of concepts to map entities saved to database in a free form mapping the website's own concepts to this shared vocabulary, in a double-colon (::) notation, like so:

$ mongo
> use HTTPS
> db['1.1.1.1:443/Page::resource:document'].insertOne({'hi': 'there'})

Here, the /Page is the name of the entity within the source system, and the ::resource:document, is the ontological vocabulary to classify the categories from all categorization systems. The vocabulary may be such or similar to the one described in the idea Network of Functions, but a subject of versioning and evolution.

For example, I suggest the V1 of the network resource vocabulary be:

{
    '::category': ['.goal', '.concept', '.question'],
    '::system':   ['.project', '.structure', '.mechanism', '.organization', '.person', '.process', '.api'],
    '::method':   ['.idea', '.principle', '.function', '.invention', '.equipment', '.product', '.endpoint'],
    '::location': ['.address', '.account', '.url'],
    '::request':  ['.CATEGORY', '.query', '.order', '.operation', '.task', '.call'],
    '::resource': ['.record', '.document', '.METHOD', '.SYSTEM'],
}

With the vocabulary versioning, the naming would look like this

mongo
> use HTTPS
> db['1.1.1.1:443/Page::v1/resource:document'].insertOne({'hi': 'there'})

Note: capitalized .CATEGORY, .METHOD, and .SYSTEM within vocabulary are aliases, meaning that if you have a ::resource that looks like a system, you should not use the ::resource namespace, but use the ::system namespace instead.

This approach results in local database collected by the program, that has databases named by protocols, that may look something like that:

FTP         114.330GB
HTTP        5.340TB
HTTPS       153.481TB
IMAP        138.183GB
POP         11.380GB
XMPP        1.423GB
admin       0.000GB
config      0.000GB
local       0.027GB

You can imagine many more protocols, like: TELNET, SSH, FTP, SMTP, DNS, AMQP, HTTPS:ATOM, HTTPS:RSS, BITCOIN, BITTORRENT, EDONKEY, FREENET, IRC, IMAP, IPFS, LDAP, HTTP, HTTPS, MIME, MQTT, NNTP, NTCIP, NTP, POP, RTP, RTSP, SIP, SMPT, TOR, TOX, XMPP, Z3950, TELEGRAM, ETHEREUM, HTTPS:GRAPHQL, HTTPS:REST, etc.

The crawlers each would define their own names to name the objects of the items crawled, e.g., /Page, but use the shared vocabulary version to add metadata for future data alignment, like ::resource:document.

The crawlers would be stored in public git repositories, and the data crawled would be shared as BitTorrent seeds. The world mapping assistant this way would automatically seed data crawled, and people would have quick access to structured data of the internet for computing, outreach and operations purposes.

Follow up thoughts:

  • Crawling leads could be generated by monitoring, what websites people visit, and how many times, giving a user statistics of what kind of sources may be interesting to crawl automatically. Very easy to do with a browser plugin, but may also be done with monitoring own OS traffic.
Mindey,



(suppress notifications) (Optional) Please, log in.

You forgot to mention any scanner in the radio frequency spectrum.


What about GNU-Radio, gqrx-sdr and gqrx-scanner ? Related thoughts on friend-seeking using radio waves: https://ieeexplore.ieee.org/abstract/document/5641096


According to lifeboat’s description of web3.0: “For example, services that use semantic web, microformats, natural language search, data mining, machine learning, recommendation agents, and artificial intelligence technologies-these services emphasize the understanding of information by machines to provide better Effective and intuitive user experience", this modeling assistant is building web3.0!

根据lifeboat对web3.0的描述:“ 例如使用语义网络、微格式、自然语言搜索、数据挖掘、机器学习、推荐代理和人工智能技术的服务——这些服务强调机器对信息的理解,以提供更有效和直观的用户体验”,这个建模助手正是在建造web3.0!


Umm, yeah. If you add the capability to send signals in arbitrary protocol, such assistant could be extended to a "universal peer" (i.e., you don't need Ethereum peer, because you are the universal peer, and free to decide which protocol-defined peer-group to collaborate with). The Internet is already decentralized, it's just that the peer size distribution is skewed (tend to follow power law), just like wealth distribution, with very few very big nodes with lots of resources, and most nodes with little resources.