I'm fucked.. :(

off-topic

#1

Yes. I’m fucked. :confused: Sorry I could not find any better headline rather than this. I’m truly fucked. :confused:

Here is the complete story: It all started in 2016. In the late 2016 when I was working on a bot to extract some quality information from a website’s API. The website is extremely reputable and No .1 in their industry.

When I was writing the bot for extracting information from the website’s API their Anti-Bot firewall was not let me do my job properly. I should state that their API has extremely well built Firewall (Cloudflare I’m under attack DDOS) + Rate limiting. So, that means when bad reputed IP visit the IP it has to verify they’re not a bot (5 sec javascript verification). After you go through the firewall you can only request 3 times in every 60 seconds. Although their API docs stated rate limiting is dynamic, it’s vary ip to ip, request to request. But after debugging their API I found I can only request 3 time per 60 seconds in most cases.

But fortunately while I was debugging their API I found a flaw. A loophole that allowed me to bypass the firewall + the rate limiting. Meaning I could request hundred number requests without being blocked or rate limited. :wink: I have been using this flaw since the day until yesterday.

Yesterday while I was checking my bot’s logs, I found that it’s has not worked for 7-8 days. After debugging for a while I realized that that sons of bitches patched those vulnerability. :cry:

I’m completely fucked because that’s my 40% of income source. Without the bot I can’t extract information. :confused: The bot completely broke.

Solutions: After banging my head in the wall I have 1 solutions in mind which is Proxy.

First I should state that TOR is not an option. Cloudflare really hate TOR networks. Yesterday I tired to the bot via tor network the result was pathetic.

Bypassing the Cloudflare DDOS Protection not a big deal. But the main deal is rate limiting. If I request more than 3 times to the API via the same IP they throw me a 429 Error.

After googling for a while I could not find a free reliable proxy services. So, I have to pay for good proxy service, the closest i could find is pubproxy. Their premium service is $12 months which I can bear.

Oh, I forgot to mention that I have to make at least 70k requests. Previously I was hitting their API for 70k requests everyday, but around 6 months ago I started hitting only every 3 days. But with the current circumstances I think I’m gonna hit every 7 days.

Now the plan is to completely rewrite the script in such a way, that every proxy is gonna hit 3 url then proxy will be changed. Here is the structure for better understanding.

api_url_list.txt = 70k API Urls (per line)
proxy_list.txt = Let’s just say 20 proxy.

The script will get the api_url_list.txt list as an array then start fetching info using the proxy from proxy_list.txt, once a proxy is used 3 times script will skip current proxy & move on to the next proxy. By this way, the proxy lists will be in a loop and 60 seconds will be passed and I can request the again for 3 times… A complete loop until everything url has been fetched.

That’s the closest solution I have in my mind. But the problem is I have no fucking idea how to build this specially the proxy changing technique. :confused: :cry:

What do you think guys…? Any better idea?

The last time I wrote this type of long thread in Gulshan Kumar’s forum. So, people who knows me they can already guess that I don’t write long posts unless I’m truly fucked or hurt. :stuck_out_tongue:


#2

Update: I have posted a Reddit & Stackoverflow thread. Feel free to hop in.


#3

Better idea, if it’s legit use, try contacting them for whitelist.


#4

They will ban my ass out. :stuck_out_tongue:


#5

So it’s not legit use?
Maybe try a small self-hosted cluster of multiple hotspots with 3-4 different sims … That will be the easiest because their IPs change faster than a girl changes her clothes.


#6

Theortically NO. Technically YES.
It’s not that I’m stealing some private information. The API is public. The reason of the firewall and the Rate Limiting is to keep away people like me. No vendor will allow their API’s to be abused.

Self hosted is not option. Since I have only one laptop and I have better things to do. I have to do with VPS.

There is also another silver lining… Their html page also represent that information but in a different way. And to my knowledge their html pages not that firewalled like their APIs. But the problem HTML pages are big and slow as hell. 70k urls may take more than 7 days to completes.! Also they’re resource hungry.


#7

Invest in a raspberry pi and a couple of ₹500 JioFi devices.


#8

Why you always give us a solutions that no one will there sane mind will never use.? :stuck_out_tongue:

It’s not about this topic, but previous one also… :stuck_out_tongue:

no offence…


#9

That’s simply because we can :wink:


#10

@Harry you should can create your own VPN service easily for free. With so many different ips using AWS Free Account.


#11

And immediately get Your account and IP banned by Amazon


#12

Why??
Is it not allowed by them. DO allows it.


#13

They don’t have any issue with VPN but with their IPs getting banned.


#14

Update: I think I did it. I have only tested it 50 links & the script needs some final touches too.

Once that’s done, I will report back what I did to get it back to work again.


#15

Hey, folks. @iamhappy (very) :stuck_out_tongue: right now… Because my bot is back to work again… :smiley:

After bashing my head for 2 days, testing many algorithm & logics from Reddit, StackOverflow, turns out there is a much much simpler way to do this.

Previously I was planning to write a algorithm but I wasn’t really happy with it. I even contacted a friend from Australia to help me with this, he gave me 3 logic to do this but I wasn’t happy either.

When I was taking a shower a idea struck me (seriously… :stuck_out_tongue:)

Oh, at first I wanted to buy Premium Proxy API, but turns out I don’t need it thanks to a python tool - ProxyBroker - A open source tool that asynchronously finds public proxies from multiple sources. :smiley:

Here is the example script… :wink:

$db_list = file ( "test_db_1.txt" ); //array
$proxy_list = 'proxies.txt';

foreach ( $db_list as $url) {
	
	$url = trim( $url );
	
retry:

	//reads the first line, in our case proxy.
	$proxy = fgets( fopen( $proxy_list, 'r' ) );
	$proxy = trim($proxy);
	
    // if blank re-run python
	if ( $proxy == "" ) {
		
		//for windows
		//exec("python get_proxy.py");
		//for Linux
		exec("python3 get_proxy.py");
		
		goto retry;
		
	}
	
	$headers = array(
		"http" => array(
			'proxy' => 'tcp://'.$proxy.'',
			'request_fulluri' => true,
			'user_agent' => 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0',
			'timeout' => 10
		)
	);
	$context = stream_context_create( $headers );
	$content = @file_get_contents( $url, false, $context );
	
	if ( $content === FALSE ) {
		
		remove_first_line($proxy_list);
		goto retry;
		
	}
	
	//output.
	echo $content;
	
	
}

function remove_first_line($file) {
	
	$contents = file($file, FILE_IGNORE_NEW_LINES);
	$first_line = array_shift($contents);
	file_put_contents($file, implode("\r\n", $contents));
	
}

I have file called get_proxy.py which find and save proxies to proxies.txt

A tiny 0.001% thanks to @Abhijeet for tweaking the script a little bit. :stuck_out_tongue:

Now, all I need is a cron. :wink:


#16

Legends say I deserve more


#17

@itsbhanusharma need an answer…

Let’s say I have 2 virtual host.

vhost_1 = max_child = 20, Memory = 256
vhost_2 = max_child = 50, Memory = 512

If vhost_1’s max_child or memory_limit has been hit then can apache kill any process in vhost_2 ?

Don’t know why but my script stops working after a period of time (after 10-12 hours)… Even I have setted time_limit = unlimited & my script does not even use 5mb memory…!


#18

Once max child is reached, what I understand is that apache2 will freak out (like it always does) and then will shut down everything and restart only those that ask it again to allocate runtime.


#19

I also suspected that… Because in my main server where 30+ v_host has deployed.
if only 1 v_host max_child is reached, apache freaks out, and breaks every v_host… :frowning:

So, should I deploy another server with native nginx because that’s gonna cost me $5 extra/month. :thinking:

One more thing, could you please change the font the forum? because it’s shit. Consider adding system-font in that case font Segoe UI will be used in Windows which looks amazing or Open Sans from Google.

Looks good, isn’t it?


#20

@itsbhanusharma Also I need a little help with cronjob.

Is it possible to set cron job on specific dates in a months?

For example I wanna run a cron every 1, 12 and 18 dates of the month?