watercrawl/WaterCrawl 0.0.1
Release version 0.0.1 - Beta

on GitHub

latest releases: v0.10.2, v0.10.1, v0.10.0...

8 months ago

Added

Initial release of WaterCrawl
Core web crawling functionality using Scrapy (v2.12.0)
Django-based web application (v5.1.4)
REST API using Django REST Framework (v3.15.2)
Asynchronous task processing with Celery (v5.4.0)
Redis integration for task queue management
MinIO integration for file storage
User authentication and authorization system
OpenAI integration capabilities
Docker support with multi-container setup
Swagger/OpenAPI documentation using drf-spectacular

Features

Web page crawling and data extraction
Asynchronous task management
User management system
API documentation
Containerized deployment support
Scalable architecture with separate services
Database integration with PostgreSQL
File storage system using MinIO
Celery beat for scheduled tasks

Dependencies

Python 3.11+
Django 5.1.4
Scrapy 2.12.0
Celery 5.4.0
Redis (latest)
PostgreSQL 17.2
Nginx
GunicornWSGI server
MinIO (optional, can use S3 or local storage)
Additional dependencies listed in requirements.txt

Infrastructure Components

Web Application Server (Gunicorn)
Celery Worker with Beat Scheduler
Nginx Web Server
PostgreSQL Database
Redis for Caching and Message Broker
MinIO/S3 for Object Storage (optional)

Check out latest releases or
releases around watercrawl/WaterCrawl 0.0.1

Don't miss a new WaterCrawl release

NewReleases is sending notifications on new releases.

Get notifications