Yes. I’m fucked. Sorry I could not find any better headline rather than this. I’m truly fucked.
Here is the complete story: It all started in 2016. In the late 2016 when I was working on a bot to extract some quality information from a website’s API. The website is extremely reputable and No .1 in their industry.
But fortunately while I was debugging their API I found a flaw. A loophole that allowed me to bypass the firewall + the rate limiting. Meaning I could request hundred number requests without being blocked or rate limited. I have been using this flaw since the day until yesterday.
Yesterday while I was checking my bot’s logs, I found that it’s has not worked for 7-8 days. After debugging for a while I realized that that sons of bitches patched those vulnerability.
I’m completely fucked because that’s my 40% of income source. Without the bot I can’t extract information. The bot completely broke.
Solutions: After banging my head in the wall I have 1 solutions in mind which is Proxy.
First I should state that TOR is not an option. Cloudflare really hate TOR networks. Yesterday I tired to the bot via tor network the result was pathetic.
Bypassing the Cloudflare DDOS Protection not a big deal. But the main deal is rate limiting. If I request more than 3 times to the API via the same IP they throw me a 429 Error.
After googling for a while I could not find a free reliable proxy services. So, I have to pay for good proxy service, the closest i could find is pubproxy. Their premium service is $12 months which I can bear.
Oh, I forgot to mention that I have to make at least 70k requests. Previously I was hitting their API for 70k requests everyday, but around 6 months ago I started hitting only every 3 days. But with the current circumstances I think I’m gonna hit every 7 days.
Now the plan is to completely rewrite the script in such a way, that every proxy is gonna hit 3 url then proxy will be changed. Here is the structure for better understanding.
api_url_list.txt = 70k API Urls (per line)
proxy_list.txt = Let’s just say 20 proxy.
The script will get the
api_url_list.txt list as an array then start fetching info using the proxy from
proxy_list.txt, once a proxy is used 3 times script will skip current proxy & move on to the next proxy. By this way, the proxy lists will be in a loop and 60 seconds will be passed and I can request the again for 3 times… A complete loop until everything url has been fetched.
That’s the closest solution I have in my mind. But the problem is I have no fucking idea how to build this specially the proxy changing technique.
What do you think guys…? Any better idea?
The last time I wrote this type of long thread in Gulshan Kumar’s forum. So, people who knows me they can already guess that I don’t write long posts unless I’m truly fucked or hurt.