Key Responsibilities:
Develop Browser Extension:
Create a browser extension to monitor and scrape real-time data from web interfaces (Telegram, Discord, and WeChat) by detecting DOM changes.
Implement mechanisms to handle web interfaces that lack direct APIs by leveraging browser automation tools (e.g., Puppeteer).
Real-Time Data Scraping:
Design a system to scrape messages and data from the specified platforms in real-time, ensuring the ability to identify and label the sources of these messages.
Implement features to aggregate, analyze, and categorize scraped data.
Data Storage and Management:
Store all scraped data, including message content and metadata, in a PostgreSQL database.
Utilize the pgvector extension to efficiently handle and store vector embeddings of the message data for advanced querying and analysis.
Ensure that the database is optimized for handling large volumes of data and supports efficient retrieval and processing.
Implement an endpoint to query the data based on user query and other filter parameters
Infrastructure Setup and Management:
Deploy the browser extension on a cloud-based virtual environment with a full desktop operating system capable of handling resource-intensive processes.
Set up and manage a scalable infrastructure that supports high-frequency data scraping, ensuring that it remains operational 24/7.
Monitoring and Healing:
Implement monitoring tools to detect and address common issues such as IP bans, account bans, and CAPTCHA challenges.
Develop strategies for automatic recovery from these issues, including using rotating proxies, multiple accounts, and other countermeasures.
Security and Compliance:
Ensure that all data scraping activities comply with legal and platform-specific terms of service.
Implement security measures to protect the infrastructure and scraped data from unauthorized access and other security threats.
Required Skills:
Browser Extension Development: Proficiency in creating and managing browser extensions, particularly for Chrome or other Chromium-based browsers.
Real-Time Data Scraping: Experience with tools like Puppeteer, Selenium, or similar for web scraping and browser automation.
PostgreSQL Database Management: Expertise in managing PostgreSQL databases, including the use of the pgvector extension for handling vector data.
Cloud Infrastructure Management: Knowledge of deploying and managing applications on cloud platforms (e.g., AWS, Google Cloud) and configuring virtual desktop environments.
Problem-Solving: Ability to implement solutions for common web scraping challenges, including IP and account bans, CAPTCHA bypassing, and managing multiple user sessions.
Monitoring and Automation: Experience in setting up monitoring tools and automation scripts to maintain uptime and address issues proactively.
Security and Compliance Awareness: Understanding of legal implications and best practices in web scraping and data handling.
Nice to Have:
Familiarity with machine learning models for data analysis and decision-making based on scraped data.
Experience with vector databases and integrating them with large language models for data summarization.
Background in financial or trading environments, especially with cryptocurrency markets.
Project Timeline:
This project is expected to begin immediately and will be rolled out in phases. Initial deployment and basic functionalities should be delivered within 4-6 weeks, followed by iterative improvements and feature expansions.
How to Apply:
Please provide examples of similar projects you have completed, particularly those involving browser extension development, real-time scraping, PostgreSQL databases with pgvector, and infrastructure management. Highlight any specific challenges you encountered and how you addressed them. If qualified, you will be asked to complete a 1 hour test task to show off your development, dev ops and soft skills.
Hourly Range: $30.00-$45.00
Posted On: August 14, 2024 14:09 UTC
Category: Scripting & Automation
Skills:TypeScript, Puppeteer, PostgreSQL, Telegram, DevOps
Country: United Kingdom
click to apply
Powered by WPeMatico
