07-30-2019, 06:47 AM
Hi all,
I am a new on forum, and sorry in advance of my speack english which is so bad.
My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list.
To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup.
I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
The goal is to get 600webpage/H
There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective.
------------
My cart:
ROCKPro64 2GB/4GB
7″ LCD TOUCH SCREEN PANEL
ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE
ROCKPro64 PLAYBOX ENCLOSURE
ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY
16GB eMMC Module
------------
Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H
Someone can help me and run this test (python) on RockPro64? And share me the elapsed time
If it is work, I will add in my cart x10 Rockpro64
Thanks you very much
---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
#---LIST of WEBPAGE---
list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1']
#---START TIME---
start = time.time()
#---Show WEBPAGE---
h='url'
for h in list:
driver.get(h)
driver.execute_script('window.scrollBy(0,10000);')
print('-----------> '+str(h))
time.sleep(1)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print('-----> BS4 OK')
#---END TIME---
end = time.time()
#---ELAPSED---
elapsed = end - start
print (elapsed)
I am a new on forum, and sorry in advance of my speack english which is so bad.
My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list.
To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup.
I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
The goal is to get 600webpage/H
There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective.
------------
My cart:
ROCKPro64 2GB/4GB
7″ LCD TOUCH SCREEN PANEL
ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE
ROCKPro64 PLAYBOX ENCLOSURE
ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY
16GB eMMC Module
------------
Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H
Someone can help me and run this test (python) on RockPro64? And share me the elapsed time
If it is work, I will add in my cart x10 Rockpro64
Thanks you very much
---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
#---LIST of WEBPAGE---
list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1']
#---START TIME---
start = time.time()
#---Show WEBPAGE---
h='url'
for h in list:
driver.get(h)
driver.execute_script('window.scrollBy(0,10000);')
print('-----------> '+str(h))
time.sleep(1)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print('-----> BS4 OK')
#---END TIME---
end = time.time()
#---ELAPSED---
elapsed = end - start
print (elapsed)