ROCKPRO64 for SCRAPING - Printable Version +- PINE64 (https://forum.pine64.org) +-- Forum: ROCKPRO64 (https://forum.pine64.org/forumdisplay.php?fid=98) +--- Forum: RockPro64 Projects, Ideas and Tutorials (https://forum.pine64.org/forumdisplay.php?fid=104) +--- Thread: ROCKPRO64 for SCRAPING (/showthread.php?tid=7778) |
ROCKPRO64 for SCRAPING - Ulthor_31 - 07-30-2019 Hi all, I am a new on forum, and sorry in advance of my speack english which is so bad. My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list. To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup. I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site. Pi3+ : it's scraps 100 webpage/H XU4 : 360 webpage/H The goal is to get 600webpage/H There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective. ------------ My cart: ROCKPro64 2GB/4GB 7″ LCD TOUCH SCREEN PANEL ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE ROCKPro64 PLAYBOX ENCLOSURE ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY 16GB eMMC Module ------------ Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H Someone can help me and run this test (python) on RockPro64? And share me the elapsed time If it is work, I will add in my cart x10 Rockpro64 Thanks you very much ---------------PYTHON CODE------------ from selenium import webdriver import time from urllib.request import urlopen from bs4 import BeautifulSoup #---CHROME OPTION AND OPEN DRIVER--- chrome_options = webdriver.ChromeOptions() prefs = {"profile.managed_default_content_settings.images": 2} chrome_options.add_experimental_option("prefs", prefs) #driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options) driver = webdriver.Chrome(chrome_options=chrome_options) driver.set_window_size(2000, 1000) #---LIST of WEBPAGE--- list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1'] #---START TIME--- start = time.time() #---Show WEBPAGE--- h='url' for h in list: driver.get(h) driver.execute_script('window.scrollBy(0,10000);') print('-----------> '+str(h)) time.sleep(1) html = driver.page_source soup = BeautifulSoup(html, 'html.parser') print('-----> BS4 OK') #---END TIME--- end = time.time() #---ELAPSED--- elapsed = end - start print (elapsed) RE: ROCKPRO64 for SCRAPING - evilbunny - 07-31-2019 (07-30-2019, 06:47 AM)Ulthor_31 Wrote: I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site. I ran your code on a pristine buster install of armbian and it output the following: Code: # python3 test.py There is a few things to note, firstly I was ssh'd into the system, so the gui was forwarded over X. I'm in Australia so .fr lookups may faster if you are in France RE: ROCKPRO64 for SCRAPING - Ulthor_31 - 08-01-2019 Thanks very much EvilBunny Can you try again with this Python, I modify just with com.au ---------------PYTHON CODE------------ from selenium import webdriver import time from urllib.request import urlopen from bs4 import BeautifulSoup #---CHROME OPTION AND OPEN DRIVER--- chrome_options = webdriver.ChromeOptions() prefs = {"profile.managed_default_content_settings.images": 2} chrome_options.add_experimental_option("prefs", prefs) #driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options) driver = webdriver.Chrome(chrome_options=chrome_options) driver.set_window_size(2000, 1000) #---LIST of WEBPAGE--- list=['https://www.amazon.com.au/dp/B07HPKWGJH','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B07HPCDHQS','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B079DQ7JK6','https://www.amazon.com.au/dp/B07CBPS16T','https://www.amazon.com.au/dp/B07CBP38HS','https://www.amazon.com.au/dp/B073R3MJ87','https://www.amazon.com.au/dp/B01LZG4KPC','https://www.amazon.com.au/dp/B077DX1NFH'] #---START TIME--- start = time.time() #---Show WEBPAGE--- h='url' for h in list: driver.get(h) driver.execute_script('window.scrollBy(0,10000);') print('-----------> '+str(h)) time.sleep(1) html = driver.page_source soup = BeautifulSoup(html, 'html.parser') print('-----> BS4 OK') #---END TIME--- end = time.time() #---ELAPSED--- elapsed = end - start print (elapsed) ----- Thanks you again RE: ROCKPRO64 for SCRAPING - evilbunny - 08-01-2019 (08-01-2019, 03:51 AM)Ulthor_31 Wrote: Thanks very much EvilBunny Was slightly faster: 87.21881079673767 RE: ROCKPRO64 for SCRAPING - Ulthor_31 - 08-01-2019 (08-01-2019, 04:01 AM)evilbunny Wrote:(08-01-2019, 03:51 AM)Ulthor_31 Wrote: Thanks very much EvilBunny Thanks you very much I will go in holidays... So I will validate my cart at September |