06/20/2024
Web Scraping - Scrapy Expert
Experience: 2+ Years
Location: Work From Home/India
Salary: 3 to 6 INR LPA
No of positions: 2 No
Industry: Automation/Data Processing
Notice Period: Immediate
JOB DESCRIPTION:
We are seeking a talented and results-oriented Web Scraping Expert to join our team. You will be responsible for developing and maintaining robust web scraping pipelines using Scrapy, a high-performance Python framework. You will leverage your expertise in Python and web scraping techniques to extract valuable data from various online sources in a reliable and scalable manner.
RESPONSIBILITIES:
Design, develop, and maintain web scraping projects using Scrapy, including:
Crawling websites to extract specific data points, implementing efficient crawling strategies to handle pagination, dynamic content, and complex website structures, Employing data extraction techniques with CSS selectors and XPath, Processing and cleaning scraped data using Python libraries (e.g., Pandas, NumPy), Storing extracted data in appropriate formats (e.g., CSV, JSON, databases).
Collaborate with data engineers and analysts to identify data needs and define scraping requirements.
Write well-documented, maintainable, and efficient Scrapy code
Integrate scraping pipelines with other Python frameworks and tools as needed
Stay up-to-date with the latest web scraping trends and best practices
Troubleshoot and address challenges related to rate limiting, authentication, and website changes
Implement caching strategies for efficient data retrieval and reduced website load
Consider potential ethical and legal implications of web scraping
Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
Proven experience in web scraping, data extraction, and automation using tools like BeautifulSoup, Scrapy, Selenium, etc.
Proficiency in programming languages such as Python, JavaScript, or Ruby, with a focus on writing clean, maintainable code.
Strong understanding of web technologies (HTML, CSS, XPath, DOM manipulation, AJAX, etc.) and web protocols (HTTP, HTTPS).
Familiarity with database systems and data storage solutions, including SQL and NoSQL databases.
Excellent problem-solving skills and attention to detail, with the ability to analyze complex data structures and troubleshoot issues effectively.
Good communication and teamwork skills, with the ability to collaborate with stakeholders from diverse backgrounds.